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TRUNCATED GALNACT2 POLYPEPTIDES AND NUCLEIC ACIDS 

CROSS-REFERENCES TO RELATED APPLICATIONS 
[0001] This application claims the benefit of U.S. Provisional Application No. 60/576,530, 
5 filed June 3, 2004 and U.S. Provisional Application No. 60/598,584, filed August 3, 2004; 
both of which are herein incorporated by reference for all purposes. 

FIELD OF THE INVENTION 
[0002] The present invention features compositions and methods related to truncated 
mutants of GalNAcT2. In particular, the invention features truncated human GalNAcT2 
10 polypeptides. The invention also features nucleic acids encoding such truncated 

polypeptides, as well as vectors, host cells, expression systems, and methods of expressing 
and using such polypeptides. 

BACKGROUND OF THE INVENTION 
[0003] A great diversity of oligosaccharide structures and many types of glycopeptides are 

15 found in nature, and these are synthesized, in part, by a large number of glycosyltransferases, 
Glycosyltransferases catalyze the synthesis of glycolipids, glycopeptides, and 
polysaccharides, by transferring an activated mono- or oligosaccharide residue to an existing 
acceptor molecule for the initiation or elongation of the carbohydrate chain. A catalytic 
reaction is believed to involve the recognition of both the donor and acceptor by suitable 

20 domains, as well as the catalytic site of the enzyme. 

[0004] Many peptide therapeutics, and many potential peptide therapeutics, are 
glycosylated peptides. The production of a recombinant glycopeptide, as opposed to a 
recombinant non-glycosylated peptide, requires that a recombinantly-produced peptide is 
subjected to additional processing steps, either within the cell or after the peptide is produced 
25 by the cell, where the processing steps are performed in vitro. The peptide can be treated 
enzymatically to introduce one or more glycosyl groups onto the peptide, using a 
glycosyltransferase. Specifically, the glycosyltransferase covalently attaches the glycosyl 
group or groups to the peptide. 

[0005] The extra in vitro steps of peptide processing to produce a glycopeptide can be time 
30 consuming and costly. This is due, in part, to the burden and cost of producing recombinant 
glycosyltransferases for the in vitro glycosylation of peptides and glycopeptides to produce 

1 
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glycopeptide therapeutics. As the demand and usefulness of recombinant glycotherapeutics 
increases, new methods are required in order to more efficiently prepare glycopeptides. 
Moreover, as more and mo4re glycopeptides are discovered to be useful for the treatment of a 
variety of diseases, there is a need for methods that lower the cost of their production. 
5 Further, there is also a need in the art to develop methods of more efficiently producing 
recombinant glycopeptides for use in developing and improving glycopeptide therapeutics. 

[0006] Glycosyltransferases are reviewed in general in Intemational (PCT) Patent 
Application No. WO03/03 1464 (PCT/US02/32263), which is incorporated herein by 
reference ia its entirety. One such particular glycosyltransferase that has utility in the 

10 development and production of therapeutic glycopeptides is GalNAcT2. GalNAcT2, or N- 
acetyl-D-galactosamine transferase, catalyzes the transfer of GalNAc firom a GalNAc donor 
to a GalNAc acceptor. Full length human GalNAcT2 enzyme is disclosed by Bennett et al. 
(1996, J Biol Chem. 271:17006-17012). However, the identification of useful mutants of this 
enzyme, having enhanced biological activity such as enhanced catalytic activity or enhanced 

1 5 stability, has not heretofore been reported. 

[0007] In tlie past, there have been efforts to increase the availability of recombinant 
glycosyltransferases for the in vitro production of glycopeptides. A limited amount of work 
has been done with respect to recombinant glycosyltransferases that may sometimes be 
suitable for small-scale production of oligosaccharides or glycopeptides. For example. White 

20 et al. have disclosed a soluble form of human GalNAcT2 (1995, J. Biol. Chem., 270:24156- 
24165). Additionally, Kurosawa et al. (1994, J Biol Chem. 269:1402-1409) describe a 
truncation mutant of chicken GalNAca2,6-sialyltransferase (ST6GalNAcI) lacking amino 
acid residues 1-232 from the full-length enzyme. However, the truncated enzyme described 
by Kurosawa et al. lacks the substrate specificity of other ST6GalNAcI enzymes. Therefore, 

25 a need still exists for recombinant glycosyltransferases having activity that is suitable for 
"pharmaceutical-scale" processes and reactions, including the production of glycopeptide 
therapeutics. In particular, there is a need for recombinant glycosyltranasferases having 
favorable functional and structural characteristics. Further, a need exists for efficient 
methods of identification and characterization of recombinant glycosyltransferases, as well as 

30 for the production of such glycosyltransferases. The present invention addresses and meets 
these needs. 
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BRIEF SUMMARY OF THE INVENTION 
[0008] In one aspect, the present invention provides an isolated nucleic acid comprising a 
nucleic acid sequence that encodes a truncated human GalNAcT2 polypeptide. The truncated 
human GalNAcT2 polypeptide lacks all or a portion of the GalNAcT2 signal domain, or in 
5 addition lacks all or a portion the GalNAcT2 transmembrane domain, or in addition lacks all 
or a portion the GalNAcT2 stem domain; with the proviso that the encoded polypeptide is not 
a human GalNAcT2 truncation mutant polypeptide lacking amino acid residues 1-51. 

[0009] In one embodiment, the isolated nucleic acid comprises a nucleic acid sequence 
having at least 90% identity with a nucleic acid selected from the group consisting of SEQ ID 
10 NO:3, SEQ ID NO:7 and SEQ ID NO:9. In another embodiment, the isolated nucleic acid 
comprises a nucleic acid sequence having at least 95% identity with a nucleic acid selected 
from the group consisting of SEQ ID NO:3, SEQ ID NO:7 and SEQ ID NO:9. In a further 
embodiment, the isolated nucleic acid comprises a nucleic acid sequence selected from SEQ 
ID NO:3, SEQ ID NO:7 and SEQ ID NO:9. 

15 [0010] In some embodiments, tihe isolated nucleic acid is an isolated chimeric nucleic acid 
encoding a fiision polypeptide. The ftision polypeptide can include a tag polypeptide 
covalently linked to a truncated human GalNAcT2 polypeptide, as described herein. 
Examples of tag polypeptides include a maltose binding protein, a histidine tag, a Factor IX 
tag, a glutathione-S-transferase tag, a FLAG-tag, and a starch binding domain tag. 

20 [0011] In another aspect, the invention provides an isolated truncated human GalNAcT2 
polypeptide, that lacks all or a portion of the GalNAcT2 signal domain, or in addition lacks 
all or a portion the GalNAcT2 transmembrane domain, or in addition lacks all or a portion the 
GalNAcT2 stem domain; with the proviso that the encoded polypeptide is not a human 
GalNAcT2 truncation mutant polypeptide lacking amino acid residues 1-51. IN one 

25 embodiment, the isolated truncated human GalNAcT2 polypeptide has at least 90% or 95% 
identity with a polypeptide selected from the group consisting of SEQ ID NO:4, SEQ ID 
NO:8 and SEQ ID NO: 10. In a ftirther aspect, isolated truncated human Gar!S[AcT2 
polypeptide comprises an amino acid sequence selected from SEQ ID NO:4, SEQ ID N0:8 
and SEQ ID NO: 10. 

30 [0012] In some embodiments, the isolated truncated GaIIS[AcT2 polypeptide isolated 

chimeric polypeptide comprising a tag polypeptide covalently liiiked to the isolated truncated 
GalNAcT2. Examples of tag polypeptides include a maltose binding protein, a histidine tag. 
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a Factor DC tag, a glutathione-S-transferase tag, a FLAG-tag, and a starch binding domain 
tag. 

[0013] The isolated nucleic acid encoding a truncated GalNAcT2 polypeptide can also be 
operably linked to a promoter/regulatory sequence, within e.g., an expression vector. The 
5 invention also includes host cells that comprise such expression vectors. Host cells can be 
e.g., eukaryotic or a prokaryotic cells. Eukaryotic cells include, e.g., mammaUan cells, an 
insect cells, and a fungal cells. Some preferred mammalian host cells are SF9 cells, ati SF9+ 
cells, an S£21 cells, a HIGH FIVE cells or Drosophila Schneider 82 cells. Prokaryotic host 
cells include, e.g., E, coli cells and 5. subtilis cells. 

1 0 [0014] The host cells can be used to producing a truncated human GalNAcT2 polypeptide, 
by growing the recombinant host cells of under conditions suitable for expression of the 
truncated human GalNAcT2 polypeptide. In preferred embodiments, sufficient truncated 
human GalNAcT2 polypeptide is made to allow commercial scale production of a 
glycoprotein or glycopeptide. 

15 [0015] In a further aspect the invention includes a method of catalyzing the transfer of a 

GalNAc moiety to an acceptor moiety comprising incubating the truncated human GalNAcT2 
polypeptide with a GalNAc moiety and an acceptor moiety, wherein said polypeptide 
mediates the covalent linkage of said GalNAc moiety to said acceptor moiety, thereby 
catalyzing the transfer of a GalNAc moiety to an acceptor moiety to produce a product 

20 saccharide, or a product glycoprotein, or a product glycopeptide. In one embodiment, the 
acceptor moiety is a granulocyte colony stimulating factor (G-CSF) protein. In another 
embodiment, the acceptor moiety is selected from erythropoietin, human growth hormone, 
granulocyte colony stimulating factor, interferons alpha, -beta, and -gamma. Factor IX, 
follicle stimulating hormone, interleukin-2, erythropoietin, anti-TNF-alpha, and a lysosomal 

25 hydrolase. In a further embodiment, the polypeptide acceptor is a glycopeptide. La some 
embodiments, the GalNAc moiety comprises a polyethylene glycol moiety. In another 
embodiment, the product saccharide, product glycoprotein, or product glycopeptide is 
produced on a commercial scale. 

BRIEF DESCRIPTION OF THE DRAWINGS 
30 [0016] For purpose of illustrating the invention, there are depicted in the drawings certain 
embodiments of the invention. However, the invention is not limited to the precise 
arrangements and instrumentalities of the embodiments depicted in the drawings. 
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[00171 Figure 1 is an image of an electrophoretic gel illustrating the PGR amplification of 
ppGalNAcT2 genes. M, 1 kb DNA ladder; PCRl, PGR product for ppGalNAcT2-N41R 
(1596 bp); PCR2, PGR product for ppGalNAcT2-N52K (1563 bp); PGR3, PGR product for 
ppGalNAcT2-N74G (1497 bp); PGR4, PGR product for ppGalNAcT2-N95G (1434 bp). 

5 [0018] Figure 2A is a plasmid restriction map for the pGWin2MBP vector. 

[0019] Figure 2B is an image of aa electrophoretic gel illustrating the firagments resulting 
fi:om multiple samples of the pGWin2MBP vector digested by both BamHI and Xhol 
restriction enzymes. 

[0020] Figure 3 is an image of an electrophoretic gel illustrating the screening of DH5a 
1 0 (ipGWin2MBP-ppGalNAcT2) colonies by restriction mapping (BamHI and Xhol digestion) 
for plasmid purified firom twelve colonies. Lane M, bp ladder. Lanes 1-3, N41R; lanes 4-6, 
N52K; lanes 8-10, N74G; lanes 11-13, N95G. 

[0021] Figure 4 is an image of an electrophoretic protein gel illustrating SDS-PAGE for 
JM109 (pGWin2MBP-ppGalNAcT2) whole cell lysates after IPTG mduction as described 
15 elsewhere herein. M, Pre-Stained MW Standard; Lane 13, IPTG-induced JM109 

(pGWin2MBP); Lanesl-12, protein in whole cells for colonies 1-12 ; Lanesl-3, JM109 
(pCWiti2]V[BP-ppGansrAcT2N41R); Lanes 4-6, JM109 (pGWin2MBP-ppGalNAcT2N52K); 
Lanes 7-9, JM109 (pGWin2MBP-ppGarNrAcT2N74G); Lanes 10-12, JM109 (pGWin2MBP- 
ppGalNAcT2N95G). 

20 [0022] Figure 5 is an image of an electrophoretic protein gel illustrating SDS-PAGE for 
JM109 (pGWin2MBP-ppGa]NAcT2) cell lysates. M, Pre-Stained MW Standard; Lane 13, 
lysate firom JM109 (pGWin2]V[BP); Lanes 1-12, lysates from colonies 1-12; Lanes 1-3, 
JM109 (pGWin2MBP-ppGaUsrAcT2N41R); Lanes 4-6, JM109 (pGWin2MBP- 
ppGa]NAcT2N52K); Lanes 7-9, JM109 (pGWin2MBP-ppGalNAcT2N74G); Lanes 10-12, 

25 JM109 (pGWin2MBP-ppGalNAcT2N95G). 

[0023] Figure 6 is an image of an electrophoretic protein gel illustrating SDS-PAGE for 
inclusion bodies isolated from JM109 (pCWin2MBP-ppGalNAcT2) cells. M, Pre-Stained 
MW Standard; Lane 13, inclusion bodies from JM109 (pCWin2MBP); Lanes 1-12, inclusion 
bodies from colonies 1-12; Lanes 1-3, JM109 (pGWin2MBP-ppGalNAcT2N41R); Lanes 4-6, 
30 JMl 09 (pGWin2MBP-ppGalNAcT2N52K); Lanes 7-9, JMl 09 (pGWin2MBP- 
ppGalNAcT2N74G); Lanes 10-12, JM109 (pGWin2MBP-ppGalNAcT2N95G). 
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[0024] Figure 7 is an image of an electrophoretic gel illustrating the protein expression 
pattern in lysates of cells containing human GalNAcT2 constructs. Lane 1, molecular weight 
marker; lane 2, construct 1 culture before induction; lane 3, construct 1 culture after 
induction; lane 4, construct 2 culture before induction; lane 5, construct 2 culture after 
5 induction; lane 6, construct 3 culture before induction; lane 7, construct 3 culture after 
induction; lane 8, construct 4 culture before induction; lane 9, construct 4 culture after 
induction; lane 10, empty. 

[0025] Figure 8 is an image of an electrophoretic protein gel illustrating the protein content 
of inclusion bodies from JM109 pCWin2 MBP-GalNAcT2 constructs. Lane 1, MW marker; 
10 lane 2, JM109 pCWin2 MBP-GalNAcT2 construct 1 inclusion bodies; lane 3, JM109 
pCWin2 MBP-GalNAcT2 construct 2 inclusion bodies. 

[0026] Figure 9 is an image of an electrophoretic protein gel illustrating the 
glycoPEGylation of G-CSF by A51 GalNAcT2-MBP. Lane 1, glycoPEGylation in the 
presence of 1 mg/ml G-CSF; lane 2, glycoPEGylation in the presence of 0.7 mg/ml G-CSF; 
15 lane 3, glycoPEGylation in the presence of 0.4 mg/ml G-CSF; lane 4, glycoPEGylation in the 
presence of 0.2 mg/ml G-CSF. The glycoPEGylated G-CSF is visible around 60 kDa. 

[0027] Figures lOA and lOB depict a nucleic acid sequence encoding a A40 GalNAcT2 
polypeptide. 

[0028] Figures 1 lA and 1 IB depict a nucleic acid sequence encoding a A51 GalNAcT2 
20 polypeptide. 

[0029] Figures 12A and 12B depict a nucleic acid sequence encoding a A73 GalNAcT2 
polypeptide. 

[0030] Figures 13 A and 13B depict a nucleic acid sequence encoding a A94 GalNAcT2 
polypeptide. 

25 [0031] Figure 14A is an image of a chromatogram illustrating the elution of A5 1 

GalNAcT2-MBP that was refolded at pH 5.5 and subsequently eluted from a Q-sepharose 
fast flow column. Fraction nmnbers are indicated on the X-axis and the relative absorbance 
of each fraction is indicated on the Y-axis. 

[0032] Figure 14B is an image of two electrophoretic gels used to visualize the eluted 
30 fractions set forth in Figure 14A. The contents of each lane on the gel are described in the 
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figure.Figure 14C is a table illustrating the relative GalNAc transferase activity of the 
fractions set forth in Figure 14A. 

[0033] Figure 1 5 A is an image of a chromatogram illustrating the elution of A5 1 
GalNAcT2-MBP that was refolded at pH 6,5 and subsequently eluted from a Q-sepharose 
5 fast flow column. Fraction numbers are indicated on the X-axis and the relative absorbance 
of each fraction is indicated on the Y-axis. 

[0034] Figure 15B is an image of two electrophoretic gels used to visuaUze the eluted 
fractions set forth in Figure 15 A. The contents of each lane on the gel are described in the 
figure. 

10 [0035] Figure 15C is a table illustrating the relative GalNAc transferase activity of the 
fractions set forth in Figure 15 A. 

[0036] Figure 16A is an image of a chromatogram illustrating the elution of A5 1 
GalNAcT2-MBP that was refolded at pH 8,0 and subsequently eluted from a Q-sepharose 
fast flow column. Fraction numbers are indicated on the X-axis and the relative absorbance 
15 of each fraction is indicated on the Y-axis. 

[0037] Figure 16B is an image of two electrophoretic gels used to visuahze the eluted 
fractions set forth in Figure 16 A. The contents of each lane on the gel are described in the 
figure. 

[0038] Figure 16C is a table illustrating the relative GalNAc transferase activity of the 
20 fractions set forth m Figure 16 A. 

[0039] Figure 17A is an image of a chromatogram illustrating the elution of A5 1 
GalNAcT2-MBP that was refolded at pH 8.5 and subsequently eluted from a Q-sepharose 
fast flow column. Fraction numbers are indicated on the X-axis and the relative absorbance 
of each fraction is indicated on the Y-axis. 

25 [0040] Figure 17B is an image of two electrophoretic gels used to visualize the eluted 
fractions set forth in Figure 17 A. The contents of each lane on the gel are described in the 
figure. 

[0041] Figure 17C is a table illustrating the relative GalNAc transferase activity of the 
fractions set forth in Figure 17 A. 
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[0042] Figure 1 8 A is an image of a chromatogram illustrating the elution of A5 1 
GalNAcT2-MBP that was refolded at pH 8.0 and subsequently eluted from a Q-sepharose 
fast flow column. Fraction numbers are indicated on the X-axis and the relative absorbance 
of each fraction is indicated on the Y-axis. 

[0043] Figure 18B is an image of two electrophoretic gels used to visualize the eluted 
fractions set forth in Figure 18 A. The contents of each lane on the gel are described in the 
figure. 

[0044] Figure 1 8C is a table illustrating the relative GalNAc transferase activity of the 
fractions set forth in Figure 18 A. 

[0045] Figure 1 9 A is an image of a chromatogram illustrating the elution of A5 1 
GalNAcT2-MBP from a Q-sepharose fast flow colurmi. Fraction numbers are indicated on 
the X-axis and the relative absorbance of each fraction is indicated on the Y-axis. 

[0046] Figure 19B is an image of two electrophoretic gels used to visuaUze the eluted 
fractions set forth in Figure 19A. The contents of each lane on the gel are described in the 
figure and correspond to the chromatogram of Figure 19 A. 

[0047] Figure 19C is a table illustrating the relative GalNAc transferase activity of the 
fractions set forth in Figure 19 A. 

[0048] Figure 20 A is an image of a chromatogram illustrating the elution of A5 1 
GalNAcT2-MBP from a Q-sepharose XL column, using 5 mM NaCl. Fraction numbers are 
indicated on the X-axis and the relative absorbance of each fraction is indicated on the Y- 
axis. 

[0049] Figure 20B is an image of two electrophoretic gels used to visuaUze the eluted 
fractions set forth in Figure 20A. The contents of each lane on the gel are described in the 
figure and correspond to the chromatogram of Figure 20 A. 

[0050] Figure 20C is a table illustrating the relative GalNAc transferase activity of the 
fractions set forth in Figure 20A. 

[0051] Figure 21 A is an image of a chromatogram illustrating the elution of A5 1 
GalNAcT2-MBP from a Q-sepharose XL column, using 50 mM NaCL Fraction numbers are 
indicated on the X-axis and the relative absorbance of each fraction is indicated on the Y- 
axis. 
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[0052] Figure 21B is an image of two electrophoretic gels used to visualize the eluted 
fractions set forth in Figure 21 A. The contents of each lane on the gel are described in the 
figure and correspond to the chromatogram of Figure 21 A. 

[0053] Figure 2 1 C is a table illustrating the relative GalNAc transferase activity of the 
5 fractions set forth in Figure 21 A. 

[0054] Figure 22A is an image of a chromatogram illustrating the elution of A5 1 
GalNAcT2-MBP from a Q-sepharose XL column, using 100 mM NaCl. Fraction numbers 
are indicated on the X-axis and the relative absorbance of each fraction is indicated on the Y- 
axis. 

10 [0055] Figure 22B is an image of two electrophoretic gels used to visualize the eluted 
fractions set forth in Figure 22 A. The contents of each lane on the gel are described in the 
figure and correspond to the chromatogram of Figure 22 A. 

[0056] Figure 22C is a table illustrating the relative GalNAc transferase activity of the 
fractions set forth in Figure 22A. 

1 5 [0057] Figure 23 A is an image of a chromatogram illustrating the elution of A5 1 

GalNAcT2-MBP from a Q-sepharose XL column, using 200 mM NaCl. Fraction numbers 
are indicated on the X-axis and the relative absorbance of each fraction is indicated on the Y- 
axis. 

[0058] Figure 23B is an image of two electrophoretic gels used to visualize the eluted 
20 fractions set forth in Figure 23 A. The contents of each lane on the gel are described in the 
figure and correspond to the chromatogram of Figure 23 A. 

[0059] Figure 23C is a table illustrating the relative GalNAc transferase activity of the 
fractions set forth in Figure 23 A. 

[0060] Figure 24A is an image of a chromatogram illustrating the elution of A5 1 
25 GalNAcT2-MBP from a Hydroxyapatite Type I column. Fraction numbers are indicated on 
the X-axis and the relative absorbance of each fraction is indicated on the Y-axis. 

[0061] Figure 24B is an image of an electrophoretic gel used to visualize the eluted 
fractions set forth in Figure 24A. The contents of each lane on the gel are described in the 
figure and correspond to the chromatogram of Figure 24A. 
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[0062] Figure 24C is a table illustrating the relative GalNAc transferase activity of the 
fractions set forth in Figure 24A. 

[0063] Figure 25 is a graph illustrating the relative GalNAc transferase activity of various 
preparations of refolded A51 GalNAcT2-MBP. The refolding conditions of each preparation 
5 is indicated on the x-axis, and the relative GalNAc transferase activity is illustrated on the Y- 
axis. 

[0064] Figure 26 is a graph illustrating the relative GalNAc transferase activity of various 
preparations of refolded A51 GalNAcT2-MBP. The refolding conditions of each preparation 
is indicated on the x-axis, and the relative GaESTAc transferase activity is illustrated on the Y- 
10 axis. 

[0065] Figure 27 is an unage of three MALDI-TOF spectra demonstrating GalNAc transfer 
to GCSF mediated by A5 1 GalNAcT2-MBP that has been refolded and purified according to 
the present invention. 

[0066] Figure 28 is an image of three MALDI-TOF spectra demonstrating GalNAc transfer 
15 to GCSF mediated by A5 1 GalNAcT2-MBP that has been refolded and purified according to 
the present invention. 

DETAILED DESCRIPTION OF THE INVENTION 
[0067] The compositions and methods of the present invention encompass truncation 

20 mutants of human GalNAcT2 polypeptides, isolated nucleic acids encoding these proteins, 
and methods of their use. GalNAcT2 polypeptides catalyze the transfer of a GalNAc from a 
GalNAc donor to a GalNAc acceptor. 

[0068] The glycosyltransferase GalNAcT2 is an essential reagent for glycosylation of 
therapeutic glycopeptides. Additionally, GalNAcT2 is an important reagent for research and 

25 development of therapeutically important glycopeptides and oligosaccharide therapeutics. 
GalNAcT2 enzymes are typically isolated and purified from natural sources, or fi-om tedious 
and costly in vitro and recombinant sources. The present invention provides compositions 
and methods relating to simplified and more cost-effective methods of production of 
GalNAcT2 enzymes. In particular, the present invention provides compositions and methods 

30 relating to truncated GalNAcT2 enzymes that have improved and usefiil properties in 
comparison to their fixU-length enzyme counterparts. 



10 



wo 2005/121331 



PCT/US2005/019442 



[0069] Truncated glycosyltransferase enzymes of the present invention are useful for in 
vivo and in vitro preparation of glycosylated peptides, as well as for the production of 
oligosaccharides containing the specific glycosyl residues that can be transferred by the 
truncated glycosyltransferase enzymes of the present invention. This is because it is shown 
5 for the first time herein that truncated forms of GalNAcT2 polypeptides possess biological 
activities comparable to, and in some instances, in excess of their full-length polypeptide 
counterparts. The present application also discloses that such truncation mutants not only 
possess biological activity, but also that the truncation mutants may have enlianced properties 
of solubility, stability and resistance to proteolytic degradation. 

10 Definitions 

[0070] Unless defined otherwise, all technical and scientific terms used herein have the 
same meaning as commonly understood by one of ordinary skill in the art to which this 
invention belongs. Although any methods and materials similar or equivalent to those 
described herein can be used in the practice or testing of the present invention, the preferred 
15 methods and materials are described herein. 

[0071] Certain abbreviations are used herein as are common ui the art, such as: "Ac" for 
acetyl; "Glc" for glucose; "Glc" for glucosamine; "GlcA for glucuronic acid; "IdoA" for 
iduronic acid; "GlcNAc" for N-acetylglucosamine; "NAN" or "sialic acid" or "SA" for N- 
acetyl neuraminic acid; "UDP" for uridine diphosphate; "CMP" for cytidine monophosphate. 

20 [0072] As used herein, each of the following terms has the meaning associated with it in 
this section. 

[0073] The articles "a" and "an" are used herein to refer to one or to more than one (i.e. to 
at least one) of the grammatical object of the article. By way of example, "an element" 
means one element or more than one element. 

25 [0074] "Encoding" refers to the inherent property of specific sequences of nucleotides in a 
nucleic acid, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of 
other polymers and macromolecules in biological processes having either a defined sequence 
of nucleotides (i.e., rRNA, tRNA and mRNA) or a defined sequence of amino acids and the 
biological properties resulting therefrom. Thus, a gene encodes a protein if transcription and 

30 translation of mRNA corresponding to that gene produces the protein in a cell or other 

biological system. Both the coding strand, the nucleotide sequence of which is identical to 
the mRNA sequence and is usually provided in sequence listings, and the non-coding strand. 
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used as the template for traascription of a gene or cDNA, can be referred to as encoding the 
protein or other product of that gene or cDNA. 

[0075] A "coding region" of a gene consists of the nucleotide residues of the coding strand 
of the gene and the nucleotides of the non-coding strand of the gene which are homologous 
5 with or complementary to, respectively, the coding region of an mRNA molecule which is 
produced by transcription of the gene. 

[0076] A "coding region" of an mRNA molecule also consists of the nucleotide residues of 
the mRNA molecule which are matched with an anticodon region of a transfer RNA 
molecule during translation of the mRNA molecule or which encode a stop codon. The 
10 coding region may thus include nucleotide residues corresponding to amino acid residues 

which are not present in the mature protein encoded by the mRNA molecule {e.g,, amino acid 
residues in a protein export signal sequence). 

[0077] An "affinity tag" is a peptide or polypeptide that may be genetically or chemically 
fused to a second polypeptide for the purposes of purification, isolation, targeting, trafficking, 
15 or identification of the second polypeptide. The "genetic" attachment of an affinity tag to a 
second protein may be effected by cloning a nucleic acid encoding the affinity tag adjacent to 
a nucleic acid encoding a second protein in a nucleic acid vector. 

[0078] As used herein, the temi "glycosyltransferase," refers to any enzyme/protein that 
has the ability to transfer a donor sugar to an acceptor moiety. 

20 [0079] A "sugar nucleotide-generating enzyme" is an enzyme that has the abiUty to 

produce a sugar nucleotide. Sugar nucleotides are known in the art, and include, but are not 
limited to, such moieties as UDP-Gal, UDP-GalNAc, and CMP-NAN. 

[0080] An "isolated nucleic acid" refers to a nucleic acid segment or firagment wliich has 
been separated from sequences which flank it in a naturally occurring state, e.g., a DNA 

25 fragment which has been removed from the sequences which are normally adjacent to the 
fragment, e.g., the sequences adjacent to the fi-agment in a genome in which it naturally 
occxxrs. The term also applies to nucleic acids which have been substantially purified from 
other components which naturally accompany the nucleic acid, e.g., RNA or DNA or 
proteins, which naturally accompany it in the cell. The term therefore includes, for example, 

30 a recombinant DNA which is incorporated into a vector, into an autonomously replicating 

plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a 
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separate molecule (e.g, as a cDNA or a genomic or cDNA fragment produced by PGR or 
restriction enzyme digestion) independent of other sequences. It also includes a recombinant 
DNA which is part of a hybrid gene encoding additional polypeptide sequence. 

[0081] In the context of the present invention, the following abbreviations for the 
commonly occurring nucleic acid bases are used, "A" refers to adenosine, "C" refers to 
cytidine, "G" refers to guanosine, "T" refers to thymidine, and "U" refers to uridine. 

[0082] A "polynucleotide" means a single strand or parallel and anti-parallel strands of a 
nucleic acid. Thus, a polynucleotide may be either a single-stranded or a double-stranded 
nucleic acid. 

[0083] The term "nucleic acid" typically refers to large polynucleotides. However, the 
terms "nucleic acid" and "polynucleotide" are used interchangeably herein. 

[0084] The term "oligonucleotide" typically refers to short polynucleotides, generally no 
greater than about 50 nucleotides. It will be understood that when a nucleotide sequence is 
represented by a DNA sequence (i.e.. A, T, G, C), this also includes an RNA sequence (i.e., 
A, U, G, C) in which "U" replaces "T." 

[0085] Conventional notation is used herein to describe nucleic acid sequences: the left- 
hand end of a single-stranded nucleic acid sequence is the 5' end; the left-hand direction of a 
double-stranded nucleic acid sequence is referred to as the 5'-direction. 

[0086] A furst defined nucleic acid sequence is said to be "immediately adjacent to" a 
second defined nucleic acid sequence when, for example, the last nucleotide of the fnst 
nucleic acid sequence is chemically bonded to the first nucleotide of the second nucleic acid 
sequence through a phosphodiester bond. Conversely, a first defined nucleic acid sequence is 
also said to be "immediately adjacent to" a second defined nucleic acid sequence when, for 
example, the first nucleotide of the first nucleic acid sequence is chemically bonded to the 
last nucleotide of the second nucleic acid sequence tlirough a phosphodiester bond. 

[0087] A first defined polypeptide sequence is said to be "immediately adjacent to" a 
second defined polypeptide sequence when, for example, the last amino acid of the first 
polypeptide sequence is chemically bonded to the first amino acid of the second polypeptide 
sequence through a peptide bond. Conversely, a first defined polypeptide sequence is said to 
be "immediately adjacent to" a second defined polypeptide sequence when, for example, the 
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first amino acid of the first polypeptide sequence is chemically bonded to the last amino acid 
of the second polypeptide sequence through a peptide bond. 

[0088] The direction of 5' to 3' addition of nucleotides to nascent RNA transcripts is 
referred to as the transcription direction. The DNA strand having the same sequence as an 
5 mRNA is referred to as the "coding strand"; sequences on the DNA strand which are located 
5' to a reference point on the DNA are referred to as "upstream sequences"; sequences on the 
DNA strand which are 3* to a reference point on the DNA are referred to as "downstream 
sequences." 

[0089] Unless otherwise specified, a "nucleotide sequence encoding an amino acid 
10 sequence" includes all nucleotide sequences that are degenerate versions of each other and 
that encode the same amino acid sequence. Nucleotide sequences that encode proteins and 
RNA may include introns. 

[0090] "Homology" as used herein, refers to nucleotide sequence similarity between two 
regions of the same nucleic acid strand or between regions of two different nucleic acid 

1 5 strands. When a nucleotide residue position in both regions is occupied by the same 
nucleotide residue, then the regions are homologous at that position. A first region is 
homologous to a second region if at least one nucleotide residue position of each region is 
occupied by the same residue. Homology between two regions is expressed in terms of the 
proportion of nucleotide residue positions of the two regions that are occupied by the same 

20 nucleotide residue. By way of example, a region havmg the nucleotide sequence 5'- 
ATTGCC-3* and a region having the nucleotide sequence 5'-TATGGC-3* share 50% 
homology. Preferably, the first region comprises a first portion and the second region 
comprises a second portion, whereby, at least about 50%, and preferably at least about 75%, 
at least about 90%, or at least about 95% of the nucleotide residue positionss of each of the 

25 portions are occupied by the same nucleotide residue. More preferably, all nucleotide residue 
positions of each of the portions are occupied by the same nucleotide residue. 

[0091] As used herein, "percent identity" is used synonymously with "homology." The 
determination of percent identity between two nucleotide or amino acid sequences can be 
accomplished using a mathematical algorithm. For example, a mathematical algorithm usefiil 
30 for comparing two sequences is the algorithm of Karlin and Altschul (1990, Proc. Natl. Acad. 
Sci. USA 87:2264-2268), modified as in Karhn and Altschul (1993, Proc. Natl. Acad. Sci. 
USA 90:5873-5877). This algorithm is incorporated into the NBLAST and XBLAST 
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programs of Altschul et al. (1990, J. Mol. Biol. 215:403-410), and can be accessed, for 
example, at the BLAST site of the National Center for Biotechnology Information (NCBI) 
world wide web site at the National Library of Medicine (NLM) at the National Institutes of 
Health (NIH). BLAST nucleotide searches can be performed with the NBLAST program 
(designated "blastn" at the NCBI web site), using the following parameters: gap penalty = 5; 
gap extension penalty = 2; mismatch penalty = 3; match reward = 1; expectation value 10.0; 
and word size = 1 1 to obtain nucleotide sequences homologous to a nucleic acid described 
herein. BLAST protein searches can be performed with the XBLAST program (designated 
"blastn" at the NCBI web site) or the NCBI "blastp" program, using the following 
parameters: expectation value 10.0, BLOSUM62 scoring matrix to obtain amino acid 
sequences homologous to a protein molecule described herein. 

[0092] To obtain gapped ahgmnents for comparison purposes. Gapped BLAST can be 
utilized as described in Altschul et al. (1997, Nucleic Acids Res. 25:3389-3402). 
Alternatively, PSI-Blast or PHI-Blast can be used to perform an iterated search which detects 
distant relationships between molecules (id.) and relationships between molecules which 
share a common pattern. When utilizing BLAST, Gapped BLAST, PSI-Blast, and PHI-Blast 
programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) 
can be used as available on the website of the National Center for Biotechnology infomxation 
of the National Library of Medicine at the National Institutes of Health. 

[0093] The percent identity between two sequences can be determined using techniques 
similar to those described above, with or without allowing gaps. In calculating percent 
identity, typically exact matches are comited. 

[0094] "Polypeptide" refers to a polyma: composed of amino acid residues, related 
naturally occurring structural variants, and synthetic non-naturally occurring analogs thereof 
hnked via peptide bonds, related naturally occurring structural variants, and synthetic non- 
naturally occurring analogs thereof Synthetic polypeptides can be synthesized, for example, 
using an automated polypeptide synthesizer. A "polypeptide," as the term is used herein, 
therefore refers to any size polymer of amino acid residues, provided that the polymer 
contains at least two amino acid residues. 

[0095] The term "protein" typically refers to large peptides, also referred to herein as 
"polypeptides." The term "peptide" typically refers to short polypeptides. However, the 
terms "peptide," "protein" and "polypeptide" are used interchangeably herein. For example. 
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the term "peptide" may refer to an amino acid polymer of three amino acids, as well as an 
amino acid polymer of several hundred amino acids. 

[0096] As used herein, amino acids are represented by the full name thereof, by the tliree 
letter code corresponding thereto, or by the one-letter code corresponding thereto, as 
5 indicated in the following table: 
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[00971 Conventional notation is used herein to portray polypeptide sequences: the left-hand 
end of a polypeptide sequence is the amino-terminus; the right-hand end of a polypeptide 
30 sequence is the carboxyl-terminus. 

[00981 A "therapeutic peptide" as the term is used herein refers to any peptide that is useful 
to treat a disease state or to improve the overall health of a living organism. A therapeutic 
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peptide may effect such changes in a Uving organism when administered alone, or when used 
to improve the therapeutic capacity of another substance. The term '^therapeutic peptide" is 
used interchangeably herein with the terms "therapeutic polypeptide" and "therapeutic 
protein." 

5 [0099] A "reagent peptide" as the term is used herein refers to any peptide that is useful in 
food biochemistry, bioremediation, production of small molecule therapeutics, and even in 
the production of therapeutic peptides. Typically, reagent peptides are enzymes capable of 
catalyzing a reaction to produce a product useful in any of the aforementioned areas. The 
term "reagent peptide" is used interchangeably herein with the terms "reagent polypeptide" 
1 0 and "reagent protein." 

[0100] A "glycopeptide" as the term is used herein refers to a peptide having at least one 
carbohydrate moiety covalently linked thereto. It will be understood that a glycopeptide may 
be a "therapeutic glycopeptide," as described above. The term "glycopeptide" is used 
interchangeably herein with the terms "glycopolypeptide" and "glycoprotein." 

15 [0101] A "vector" is a composition of matter which comprises an isolated nucleic acid and 
which can be used to deUver the isolated nucleic acid to the interior of a cell. Numerous 
vectors are known in the art including, but not limited to, linear nucleic acids, nucleic acids 
associated with ionic or amphiphilic compoimds, plasmids, and viruses. Thus, the term 
"vector" includes an autonomously repUcating plasmid or a virus. The term should also be 

20 construed to include non-plasmid and non-viral compounds which facilitate transfer of 

nucleic acid into cells, such as, for example, polylysine compounds, liposomes, and the like. 
Examples of viral vectors include, but are not limited to, adenoviral vectors, adeno-associated 
vims vectors, retroviral vectors, and the like. 

[0102] "Expression vector" refers to a vector comprising a recombinant nucleic acid 
25 comprising expression control sequences operatively linked to a nucleotide sequence to be 
expressed. An expression vector comprises sufficient cis-acting elements for expression; 
other elements for expression can be supplied by the host cell or in an in vitro expression 
system. Expression vectors include all those known in the art, such as cosmids, plasmids 
(e.g., naked or contained in liposomes) and vimses that incorporate the recombinant nucleic 
30 acid. 
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[0103] A "multiple cloning site" as the term is used herein is a region of a nucleic acid 
vector that contains more than one sequence of nucleotides that is recognized by at least one 
restriction enzyme. 

[0104] An "antibiotic resistance marker" as the tenn is used herein refers to a sequence of 
5 nucleotides that encodes a protein which, when expressed in a living cell, confers to that cell 
the ability to live and grow in the presence of an antibiotic. 

[0105] As used herein, the term "GalNAcT2" refers to N-acetyl-D-galactosamine 
transferase 2. 

[0106] As the term is used herein, a "truncated" form of a peptide refers to a peptide that is 
10 lacking one or more amino acid residues as compared to the full-length amino acid sequence 
of the peptide. For example, the peptide 'TSIH2-Ala-Glu-Lys-Leu-COOH" is an N-terminally 
truncated fomi of the ftdl-length peptide 'TSIH2-Gly-Ala-Glu-Lys-Leu-COOH." The terms 
"truncated form" and "truncation mutant" are used interchangeably herein. By way of a non- 
limitmg example, a truncated peptide is a GalNAcT2 polypeptide comprising an active 
15 domain, a stem domain, a transmembrane domain, and a signal domain, wherein the signal 
domain is lacking a single N-terminal amino acid residue as compared to the full length 
GalNAcT2. 

[0107] The term "saccharide" refers in general to any carbohydrate, a chemical entity with 
the most basic structure of (CH20)n. Saccharides vary in complexity, and may also include 
20 nucleic acid, amino acid, or virtually any other chemical moiety existing in biological 
systems. 

[0108] "Monosaccharide" refers to a single unit of carbohydrate of a defined identity. 

[0109] "Oligosaccharide" refers to a molecule consisting of several xmits of carbohydrates 
of defined identity. Typically, saccharide sequences between 2-20 units may be referred to as 
25 oligosaccharides. 

[0110] "Polysaccharide" refers to a molecule consisting of many units of carbohydrates of 
defined identity. However, any saccharide of two or more imits may correctly be considered 
a polysaccharide. 



18 



wo 2005/121331 



PCT/US2005/019442 



[0111] As used herein, a saccharide "donor" is a moiety that can provide a saccharide to a 
glycosyltransferase so that the glycosyltransferase may transfer the saccharide to a saccharide 
acceptor. By way of a non-limiting example, a GalNAc donor may be UDP-GalNAc. 

[0112] As used herein, a saccharide "acceptor" is a moiety that can accept a saccharide 
&om a saccharide donor. A glycosyltransferase can covalently couple a saccharide to a 
saccharide acceptor. By way of a non-hmiting example, G-CSF may be a GalNAc acceptor, 
and a GalNAc moiety may be covalently coupled to a GalNAc acceptor by way of a GalNAc- 
transferase. In some embodiments, a saccharide acceptor is a protem or peptide comprising 
an O glycosylation site. In further embodiments, saccharide acceptors include, e,g,, 
erythropoietin, human growth hormone, granulocyte colony stimulating factor, interferons 
alpha, -beta, and -gamma. Factor DC, follicle stimulating hormone, interleukin-2, 
erythropoietin, anti-TNF-alpha, and a lysosomal hydrolase 

[0113] An oligosaccharide with a "defmed size" is one which consists of an identifiable 
number of monosaccharide units. For example, an oligosaccharide consisting of 10 
monosaccharide xmits is one which may consist of 10 identical monosaccharide units or 5 
monosaccharide units of a first identity and 5 monosaccharide units of a second identity. 
Further, an oligosaccharide of defined size that consists of monosaccharide units of 
heterogeneous identity may have the monosaccharide units in any order firom beginning to 
end of the oligosaccharide. 

[01 14] An oUgosaccharide of "random size" is one which may be synthesized using 
methods that do not provide oligosaccharide products of defmed size. For example, a method 
of oligosaccharide synthesis may provide oligosaccharides that range firom two 
monosaccharide xmits to twenty-two saccharide units, including any or all lengths in between. 

[0115] "Commercial scale" refers to gram scale production of a product saccharide, or 
glycoprotein, or glycopeptide in a single reaction. In preferred embodiments, commercial 
scale refers to production of greater than about 50, 75, 80, 90 or 100, 125, 150, 175, or 200 
grams. 

[0116] The term "sialic acid" refers to any member of a family of nine-carbon carboxylated 
sugars. The most common member of the siaUc acid family is N-acetyl-neuraminic acid (2- 
keto-5-acetamido-3,5-dideoxy-D-glycero-D-<galactononulopyranos-l-onic acid (often 
abbreviated as Neu5Ac, NeuAc, or NANA). A second member of the family is N-glycolyl- 
neuraminic acid (Neu5Gc or NeuGc), in which the N-acetyl group of NeuAc is hydroxylated. 
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A third sialic acid family member is 2-keto-3-deoxy-nontdosomc acid (KDN) (Nadano et al. 
(1986) J. Biol. Chem. 261: 11550-11557; Kanamori etal.,J. Biol. Chem. 265: 21811-21819 
(1990)). Also included are 9-substituted sialic acids such as a 9-0-Ci-C6 acyl-Neu5Ac like 
9-0-lactyl-Neu5 Ac or 9-0-acetyl-Neu5Ac, 9-deoxy-9-fluoro-Neu5Ac and 9-azido-9-deoxy- 
NeuSAc. For review of the sialic acid family, see, e.g., Varki, Glycobiology 2: 25-40 (1992); 
Sialic Acids: Chemistry, Metabolism and Function, R. Schauer, Ed. (Springer-Verlag, New 
York (1992)). The synthesis and use of sialic acid compounds in a sialylation procedure is 
disclosed in international application WO 92/16640, published October 1, 1992. 

[0117] A " method of remodeling a protein, a peptide, a glycoprotein, or a glycopeptide" as 
used herein, refers to addition of a sugar residue to a protein, a peptide, a glycoprotein, or a 
glycopeptide using a glycosyltransferase. In a preferred embodiment, the sugar residue is 
covalently attached to a PEG molecule. 

[0118] An "unpaired cysteine residue" as used herein, refers to a cysteine residue, which in 
a correctly folded protein (i.e., a protein with biological activity), does not form a disulfide 
bind with another cysteine residue. 

[0119] An "insoluble glycosyltransferase" refers to a glycosyltransferase that is expressed 
in bacterial inclusion bodies. Insoluble glycosyltransferases are typically solubilized or 
denatured using e.g., detergents or chaotropic agents or some combination. "Refolding" 
refers to a process of restoring the structure of a biologically active glycosyltransferase to a 
glycosyltransferase that has been solubiUzed or denatured. Thus, a refolding buffer, refers to 
a buffer that enhances or accelerates refolding of a glycosyltransferase. 

[0120] A "redox couple" refers to mixtures of reduced and oxidized thiol reagents and 
include reduced and oxidized glutathione (GSH/GSSG), cysteine/cystine, 
cysteamine/cystamine, DTT/GSSG, andDTE/GSSG. (See. e.g., Clark, Cur. Op. Biotech. 
12:202-207 (2001)). 

[0121] The term "contacting" is used herein interchangeably with the following: combined 
with, added to, mixed with, passed over, incubated with, flowed over, etc. 

[0122] The term "PEG" refers to poly(ethylene glycol). PEG is an exemplary polymer that 
has been conjugated to peptides. The use of PEG to derivatize peptide therapeutics has been 
demonstrated to reduce the immunogenicity of the peptides and prolong the clearance time 
from the circulation. For example, U.S. Pat. No. 4,179,337 (Davis et al.) concerns non- 
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immunogenic peptides, such as enzymes and peptide homiones coupled to polyethylene 
glycol (PEG) or polypropylene glycol. Between 10 and 100 moles of polymer are used per 
mole peptide and at least 15% of the physiological activity is maintained. 

[0123] The term "specific activity" as used herein refers to the catalytic activity of an 
5 enzyme, e.g., a recombinant glycosyltransferase fusion protein of the present invention, and 
may be expressed in activity units. As used herein, one activity unit catalyzes the formation 
of 1 |J-mol of product per minute at a given temperature (e.g., at 37°C) and pH value (e.g., at 
pH 7.5). Thus, 10 units of an enzyme is a catalytic amount of that enzyme where 10 |amol of 
substrate are converted to 10 |amol of product in one minute at a temperature of, e.g., 37 ''C 
10 and a pH value of, e.g., 7.5. 

[0124] "N-linked" oligosaccharides are those oligosaccharides that are linked to a peptide 
backbone through asparagine, by way of an asparagine-N-acetylglucosamine Unkage. N- 
linked oligosaccharides are also called "N-glycans." All N-linked ohgosaccharides have a 
common pentasaccharide core of Maa3GlcNAc2. They differ in the presence of, and in the 
15 number of branches (also called antennae) of peripheral sugars such as N-acetylglucosamine, 
galactose, N-acetylgalactosamine, fiicose and sialic acid. Optionally, this structure may also 
contain a core fiicose molecule and/or a xylose molecule. 

[0125] "O-linked" oligosaccharides are those ohgosaccharides that are linked to a peptide 
backbone through threonine, serine, hydroxyproUne, tyrosine, or other hydroxy-containing 
20 amino acids. 

[0126] The term "substantially" in the above definitions of "substantially imiform" 
generally means at least about 60%, at least about 70%, at least about 80%), or more 
preferably at least about 90%, and still more preferably at least about 95% of the acceptor 
substrates for a particular glycosyltransferase are glycosylated. 

25 [0127] A "fusion protein" refers to a protein comprising amino acid sequences that are in 
addition to, in place of, less than, and/or different from the amino acid sequences encoding 
the original or native full-length protein or subsequences thereof 

[0128] A "stem region" with reference to glycosyltransferases refers to a protein domain, or 
a subsequence thereof, which in the native glycosyltransferases is located adjacent to the 
30 trans-membrane domain, and has been reported to function as a retention signal to maintain 
the glycosyltransferase in the Golgi apparatus and as a site of proteolytic cleavage. Stem 
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regions generally start with the first hydrophilic amino acid following the hydrophobic 
transmembrane domain and end at the catalytic domain, or in some cases the first cysteine 
residue following the transmembrane domain. Exemplary stem regions include, but is not 
limited to, the stem region of eukaryotic ST6GalNAcI, amino acid residues from about 30 to 
5 about 207 (see e.g., the murine enzyme), amino acids 35-278 for the h uman enzyme or 
amino acids 37-253 for the chicken enzyme; the stem region of mammalian GalNAcT2, 
amino acid residues from about 71 to about 129 (see e.g.^ the rat enzyme). 

[0129] A "catalytic domain" refers to a protein domain, or a subsequence thereof, that 
catalyzes an enzymatic reaction performed by the enzyme. For example, a catalytic domain 
10 of a sialyltransferase will include a subsequence of the sialyltransferase sufficient to transfer 
a sialic acid residue from a donor to an acceptor saccharide. A catalytic domain can include 
an entire enzyme, a subsequence thereof, or can include additional amino acid sequences that 
are not attached to the enzyme, or a subsequence thereof, as found in nature. 

[0130] The term "isolated" refers to material that is substantially or essentially free from 
15 components which interfere with the activity of an enzyme. For a saccharide, protein, or 
nucleic acid of the invention, the term "isolated" refers to material that is substantially or 
essentially free from components which normally accompany the material as fomid in its 
native state. Typically, an isolated saccharide, protein, or nucleic acid of the invention is at 
least about 80% pure, usually at least about 90%, and preferably at least about 95% pure as 
20 measured by band intensity on a silver stained gel or other method for determining purity. 
Purity or homogeneity can be indicated by a number of means well known in the art. For 
example, a protein or nucleic acid in a sample can be resolved by polyacrylamide gel 
electrophoresis, and then the protein or nucleic acid can be visualized by staining. For certain 
purposes high resolution of the protein or nucleic acid may be desirable and HPLC or a 
25 similar means for purification, for example, may be utilized. 

Description 

1. Isolated nucleic acids 
A. Generally 

[0131] Exemplified herein are various truncation mutants of human GalNAcT2. However, 
30 the present invention should not be construed to cover a human GalNAcT2 truncation mutant 
polypeptide lacking amino acid residues 1-51. 
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[0132] Full-length GalNAcT2 nucleic acids encode polypeptides that have a domain 
structure similar to other glycosyltransferases, including an N-terminal signal domain, a 
transmembrane domain, a stem domain, and an active domain, wherein the active domain 
may comprise the majority of the amino acid sequence of such polypeptides. As will be 
understood by one of skill in the art, the presence of domain structure(s) extraneous to the 
active domain of recombinant GalNAcT2 polypeptides may have a negative effect on the 
solubility, stability and activity of the polypeptide in an aqueous or in vitro environment. For 
example, while not wishing to be bound by any particular theory, the presence of a 
hydrophobic transmembrane domain on a recombinant GalNAcT2 polypeptide used in an in 
vitro reaction mixture may render the polypeptide less soluble than a recombinant GalNAcT2 
polypeptide without a hydryophobic transmembrane domain, and further, may even decrease 
the enzymatic activity of the polypeptide by affecting or destabilizing the folded structure. 

[0133] Therefore, it is desirable to produce recombinant GalNAcT2 nucleic acids that 
encode GalNAcT2 that is shorter than full-length GalNAcT2, for the purpose of enhancing 
the activity, stability and/or utility of GalNAcT2 polypeptides. The present invention 
provides such modified forms of GalNAcT2. More particularly, the present invention 
provides isolated nucleic acids encoding such truncated polypeptides. 

[0134] Nucleic acids of the present invention encode truncated forms of GaUSfacT2 
polypeptides, as described in greater detail elsewhere herein. A truncated GalNAcT2 
polypeptide encoded by a nucleic acid of the present invention, also referred to herein as a 
"truncation mutant," may be truncated in various ways, as would be understood by the skilled 
artisan. Examples of truncated polypeptides encoded by a nucleic acid of the present 
invention include, but are not limited to, a polypeptide lacking a single N-terminal residue, a 
polypeptide lacking a single C-terminal residue, a polypeptide lacking both an single N- 
terminal residue and a single C-terminal residue, a polypeptide lacking a contiguous sequence 
of residues from the N-terminus, a polypeptide lacking a contiguous sequence of residues 
from the C-terminus, and any combinations thereof 

[0135] Therefore, it will be understood, based on the disclsure set forth herein, that 
truncations of nucleic acids encoding GalNAcT2 polypeptides may be made for numerous 
reasons. In one embodiment of tiie invention, a truncation may be made in order to remove 
part or all of the nucleic acid sequence encoding the signal peptide domain of an GalNAcT2. 
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[01361 In another embodiment of the invention, a truncation may be made in order to 
remove part or all of a nucleic acid sequence encoding a transmembrane domain of an 
Ga]NAcT2. By way of a non-limiting example, removal of a part or all of a nucleic acid 
sequence encoding a transmembrane domain may increase the solubility or stability of the 
encoded GalNAcT2 polypeptide and/or may increase the level of expression of the encoded 
polypeptide. 

[0137] In yet another embodiment of the invention, a truncation may be made in order to 
remove part or all of a nucleic acid sequence encoding a stem domain of an GalNAcT2. By 
way of a non-limiting example, removal of a part or all of a nucleic acid sequence encoding a 
stem domain may increase the solubility or stability of the encoded Ga]NAcT2 polypeptide 
and/or may increase the level of expression of the encoded polypeptide. 

[0138] The skilled artisan, when armed with the disclosure set forth herein, will understand 
how to design and create a truncation mutant of GalNAcT2 as set forth in detail elsewhere 
herein. In one aspect of the invention, the nucleic acid residue at which a truncation is made 
may be a highly-conserved residue. In another aspect of the invention, the nucleic acid 
residue at which a truncation is made may be selected such that the encoded polypeptide has 
a new N-terminal amino acid residue that will aid in the purification of the expressed 
polypeptide. In yet another aspect, the nucleic acid residue at which a truncation is made 
may be selected such that the encoded truncated polypeptide does not contain a specific 
secondary and/or tertiary structure. 

B. GaUSrAcT2 Isolated Nucleic Acids 
[0139] The present invention features nucleic acids encoding smaller than full-length 
GalNAcT2. That is, the present invention features a nucleic acid encoding a truncated 
GalNAcT2 polypeptide, provided the polypeptide expressed by the nucleic acid retains the 
biological activity of the full-length protein. In one aspect of the invention, a truncated 
polypeptide is a human truncated GalNAcT2 polypeptide. 

[0140] As would be vmderstood by the skilled artisan, a nucleic acid encoding a full-length 
human GaINAcT2 may contain a nucleic acid sequence encoding one or more identifyable 
polypeptide domains in addition to the "active domain," the domain primarily responsible for 
the catalytic activity, of GalNAcT2. This is because it is known in that art that a fiill-length 
GaINAcT2 polypeptide, and in particular, a full-length human GalNAcT2 polypeptide, 
contains a signal domain, a transmembrane domain, and a stem domain, in addition to an 
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active domain. Accordingly, a nucleic acid encoding a full-length human GalNAcT2 may 
encode a polypeptide that has a signal domain at the amino-temiinus of the polypeptide, 
followed by a transmembrane domain immediately adjacent to the signal domain, followed 
by a stem domain that is immediately adjacent to the transmembrane domain, followed by an 
active domain that extends to the carboxy-terminus of the polypeptide and is located 
immediately adjacent to the stem domain. 

[0141] Therefore, in one embodiment, an isolated nucleic acid of the invention may encode 
a truncated human GalNAcT2 polypeptide, wherein the truncated human GalNAcT2 
polypeptide is lacking all or a portion of the GalNAcT2 signal domain. In another 
embodiment, an isolated nucleic acid of the invention may encode a truncated human 
GalNAcT2 polypeptide, wherein the truncated human GalNAcT2 polypeptide is lacking liie 
GalNAcT2 signal domain and all or a portion of the GalNAcT2 transmembraue domain. In 
yet another embodiment, a nucleic acid of the invention may encode a truncated human 
GalNAcT2 polypeptide, wherein the truncated human GalNAcT2 polypeptide is lacking the 
GalNAcT2 signal domain, the GalNAcT2 transmembrane domain and all or a portion the 
GalNAcT2 stem domain. 

[0142] When armed with the disclosure of the present invention, the skilled artisan will 
know how to make and use these and other such truncation mutants of human GalNAcT2. 

[0143] The "biological activity of GalNAcT2" is the ability to transfer a GalNAc moiety 
from a GalNAc donor to an acceptor molecule. Full-length human GalNAcT2, the sequence 
of which is set forth in SEQ ID NO:l, exhibits such activity. The "biological activity of a 
GalNAcT2 truncated polypeptide" is similarly the ability to transfer a GalNAc moiety from a 
GalNAc donor to an acceptor molecule. That is, a truncated GalNAcT2 polypeptide of the 
present invention can catalyze the same glycosyltransfer reaction as the full-length 
GalNAcT2. By way of a non-limiting example, a truncated human Ga]NAcT2 polypeptide 
encoded by a GalNAcT2 nucleic acid of the invention has the ability to transfer a GalNAc 
moiety from a UDP-GalNAc donor to a granulocyte-colony stimulating factor (G-CSF) 
acceptor, wherein such a transfer results in the O-linked covalent coupling of a GalNAc 
moiety to a threonine residue of G-CSF. 

[0144] Therefore, a nucleic acid encoding a smaller iiian full-length, or "truncated," 
GarNAcT2 is included in the present invention provided that the truncated GalNAcT2 has 
GalNAcT2 biological activity. 
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[0145] The methods and compositions of the invention should not be construed to be 
limited solely to a nucleic acid comprising a GalNAcT2 truncation mutant as disclosed 
herein, but rather, should be construed to encompass any nucleic acid encoding a GalNAcT2 
truncated mutant, prepared in accordance with the disclosure herein, either known or 
5 unknown, which is capable of catalyzing transfer of a GalNAc to a GalNAc acceptor. 
Modified nucleic acid sequences, i.e. nucleic acid sequences having sequences that differ 
from the nucleic acid sequences encoding the naturally-occurring proteins, are also 
encompassed by methods and compositions of the invention, so long as the modified nucleic 
acid still encodes a truncated protein having the biological activity of catalyzing the transfer 

10 of a GalNAc to a GalNAc acceptor, for example. These modified nucleic acid sequences 

include modifications caused by point mutations, modifications due to the degeneracy of the 
genetic code or naturally occurring allelic variants, and further modifications that have been 
introduced by genetic engineering, i.e., by the hand of man. Thus, the term nucleic acid also 
specifically includes nucleic acids composed of bases other than the five biologically 

15 occurring bases (adenine, guanine, thymine, cytosine and uracil). 

[0146] The present invention features an isolated nucleic acid comprising a nucleic acid 
sequence that is at least about 90%, 95%, 97%, 98%, or 99% identical to a nucleic acid 
sequence set forth in any one of SEQ ID NO:3, SEQ ID NO:7 or SEQ ID NO:9. The present 
invention also features an isolated nucleic acid sequence comprising any one of the sequences 
20 set forth in SEQ ID NO:3, SEQ ID NO:7 or SEQ ID NO:9, wherein the isolated nucleic acid 
encodes a truncated GalNAcT2 polypeptide. 

[0147] The present invention also encompasses isolated nucleic acid molecules encoding a 
truncated GalNAcT2 polypeptide that contains changes in amino acid residues that are not 
essential for activity. Such polypeptides encoded by an isolated nucleic acid of the invention 

25 differ in amino acid sequence from any one of the sequences set forth in SEQ ID NO:4, SEQ 
ID NO:8 or SEQ ID NO: 1 0, yet retain the biological activity of GalNAcT2. By way of a 
non-limiting example, an isolated nucleic acid of the invention may include a nucleotide 
sequence encoding a polypeptide having an amino acid sequence that is at least about 90%, 
95%, 97%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO:4. Further, by 

30 way of another non-limiting example, an isolated nucleic acid of the invention may include a 
nucleotide sequence encoding a polypeptide that has an amino acid sequence at least about 
90%, 95%), 97%, 98%, or 99% identical to an amino acid sequence set forth in any one of 
SEQ ID NO:8 or SEQ ID NO: 10. 
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[0148] The determination of percent identity between two nucleotide or amino acid 
sequences can be accomplished using a mathematical algorithm. For example, a 
mathematical algorithm useful for comparing two sequences is the algorithm of Karlin and 
Altschul (1990, Proc. Natl. Acad. Sci. USA 87:2264-2268), modified as in Karlin and 
5 Altschul (1993, Proc. Natl. Acad. Sci. USA 90:5873-5877). This algorithm is incorporated 
into the NBLAST and XBLAST programs of Altschul, et al. (1990, J. Mol. Biol. 215:403- 
410), and can be accessed, for example at the National Center for Biotechnology Information 
(NCBI) world wide web site. BLAST nucleotide searches can be performed with the 
NBLAST program (designated "blastn" at the NCBI web site), using the following 

1 0 parameters: gap penalty = 5; gap extension penalty = 2; mismatch penalty = 3 ; match reward 
= 1 ; expectation value 1 0.0; and word size = 1 1 to obtain nucleotide sequences homologous 
to a nucleic acid described herein. BLAST protein searches can be performed with the 
XBLAST program (designated "blastn" at the NCBI web site) or the NCBI "blastp" program, 
usmg the following parameters: expectation value 10.0, BLOSUM62 scoring matrix to 

15 obtain amino acid sequences homologous to a protein molecule described herein. To obtain 
gapped aUgnments for comparison purposes, Gapped BLAST can be utilized as described in 
Altschul et al. (1997, Nucleic Acids Res. 25:3389-3402). Alternatively, PSI-Blast or PHI- 
Blast can be used to perform an iterated search which detects distant relationships between 
molecules and relationships between molecules which share a common pattem. When 

20 utilizing BLAST, Gapped BLAST, PSI-Blast, and PHI-Blast programs, the default 

parameters of the respective programs (e.g., XBLAST and NBLAST) can be used. See, 
generally, the internet website for the National Center for Biotechnology Information, which 
is maintained by the National Library of Medicine and the National Listitutes of Health. 

[0149] In another aspect, a nucleic acid useful in the methods and compositions of the 
25 present invention and encoding a truncated GalNAcT2 polypeptide may have at least one 

nucleotide inserted into the nucleic acid sequence of such a truncated mutant. Alternatively, 
an additional nucleic acid encoding a truncated GalNAcT2 polypeptide may have at least one 
nucleotide deleted from the nucleic acid sequence. Further, a GalNAcT2 nucleic acid 
encoding a truncated mutant and useful in the invention may have both a nucleotide insertion 
30 and a nucleotide deletion present in a single nucleic acid sequence encoding the truncated 
polypeptide. 

[0150] Techniques for introducing changes in nucleotide sequences that are designed to 
alter the functional properties of the encoded proteins or polypeptides are well known in the 
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art. Such modifications include the deletion, insertion, or substitution of bases, and thus, 
changes in the amino acid sequence. As is known to one of skill in the art, nucleic acid 
insertions and/or deletions may be designed into the gene for numerous reasons, including, 
but not limited to modification of nucleic acid stability, modification of nucleic acid 
5 expression levels, modification of expressed polypeptide stabiUty or half-Ufe, modification of 
expressed polypeptide activity, modification of expressed polypeptide properties and 
characteristics, and changes in glycosylation pattern. All such modifications to the nucleotide 
sequences encoding such proteins are encompassed by the present invention. 

[0151] It is not intended that methods and compositions of the present invention be limited 
10 by the nature of the nucleic acid employed. The target nucleic acid encompassed by methods 
and compositions of the invention may be native or synthesized nucleic acid. The nucleic 
acid may be DNA or RNA and may exist in a double-stranded, single-stranded or partially 
double-stranded form. Furthermore, the nucleic acid may be foxmd as part of a virus or other 
macromolecule. See, e.g., Fasbender et al., 1996, J. Biol. Chem. 272:6479-89, 

15 XL Vectors and Expression Svstems 

[0152] In other related aspects, the invention includes an isolated nucleic acid encoding a 
truncated GalNAcT2 polypeptide operably linked to a nucleic acid comprising a 
promoter/regulatory sequence such that the nucleic acid is preferably capable of directing 
expression of the polypeptide encoded by the nucleic acid. Thus, the invention encompasses 

20 expression vectors and methods for the introduction of exogenous DNA into cells with 

concomitant expression of the exogenous DNA in those cells, as described, for example, in 
Sambrook et al. (Third Edition, 2001, Molecular Cloning: A Laboratory Manual, Cold 
Spring Harbor Laboratory, New York), and in Ausubel et al. (1997, Current Protocols in 
Molecular Biology, John Wiley & Sons, New York). 

25 [0153] Expression of a truncated GaINAcT2 polypeptide in a cell may be accompUshed by 
generating a plasmid, viral, or other type of vector comprising a nucleic acid encoding the 
appropriate nucleic acid, wherein the nucleic acid is operably linked to a promoter/regulatory 
sequence which serves to drive expression of the encoded polypeptide, with or without tag, in 
cells in which the vector is introduced. In addition, promoters which are well laiown in the 

30 art which are induced in response to inducing agents such as metals, glucocorticoids, and the 
like, are also contemplated in the invention. Thus, it will be appreciated that the invention 
includes the use of any promoter/regulatory sequence, which is either known or imknown. 
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and which is capable of driving expression of the truncated GalNAcT2 polypeptide operably 
linked thereto. 

[0154] In an expression system useful in the present invention, a nucleic acid encoding a 
truncated GalNAcT2 polypeptide may be fused to one or more additional nucleic acids 
5 encoding a functional polypeptide. By way of a non-limiting example, an affinity tag coding 
sequence may be inserted into a nucleic acid vector adjacent to, upstream from, or 
downstream from a truncated GalNAcT2 polypeptide coding sequence. As will be 
understood by one of skill in the art, an affmity tag will typically be inserted into a multiple 
cloning site in frame with the truncated GalNAcT2 polypeptide. One of skill in the art will 
10 also understand that an affinity tag coding sequence can be used to produce a recombinant 
fusion protein by concomitantly expressing the affinity tag and truncated GalNAcT2 
polypeptide. The expressed fusion protein can then be isolated, purified, or identified by 
means of the affinity tag. 

[0155] Aff inity tags useful in the present invention include, but are not limited to, a maltose 
15 binding protein, a histidine tag, a Factor IX tag, a glutathione-S-transferase tag, a FLAG-tag, 
and a starch binding domain tag. Other tags are well known in the art, and the use of such 
tags in the present invention would be readily imderstood by the skilled artisan. 

[0156] As would be understood by one of skill in the art, a vector comprising a truncated 
GalNAcT2 polypeptide of the present invention may be used to express the truncated 

20 polypeptide as either a non-fusion or as a fusion protein. Selection of any particular plasmid 
vector or other DNA vector is not a limiting factor in this invention and a wide plethora of 
vectors are well-known m the art. Further, it is well within the skill of the artisan to choose 
particular promoter/regulatory sequences and operably link those promoter/regulatory 
sequences to a DNA sequence encoding a truncated GalNAcT2 polypeptide. Such 

25 technology is well known in the art and is described, for example, in Sambrook et al. (Third 
Edition, 2001, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, 
New York), and in Ausubel et al. (1997, Current Protocols in Molecular Biology, John Wiley 
& Sons, New York). By way of a non-limiting example, a vector useful in one embodiment 
of the present invention is based on the pcWori+ vector (Muchmore et al., 1987, Meth. 

30 Enzymol. 177:44-73). 

[0157] The invention thus includes a vector comprising an isolated nucleic acid encoding a 
truncated GalNAcT2 polypeptide. The incorporation of a nucleic acid into a vector and the 
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choice of vectors is well-known in the art as described in, for example, Sambrook et al. 
(Third Edition, 2001, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor 
Laboratory, New York), and in Ausubel et al. (1997, Current Protocols in Molecular Biology, 
John Wiley & Sons, New York). 

5 [01581 In an aspect of the invention, an isolated nucleic acid encoding a truncated 

GalNAcT2 polypeptide is integrated into the genome of a host cell m conjunction with a 
nucleic acid encoding a truncated GalNAcT2 polypeptide. In another aspect of the invention, 
a cell is transiently transfected with an isolated nucleic acid encoding a truncated GarN'AcT2 
polypeptide. In yet another aspect of the invention, a cell is stably transfected with an 
10 isolated nucleic acid encoding a truncated GalNAcT2 polypeptide. 

[0159] For the purpose of inserting an isolated nucleic acid into a cell, one of skill in the art 
would also understand that the methods available and the methods required to introduce an 
isolated nucleic acid of the invention into a host cell vary and depend upon the choice of host 
cell. Suitable methods of introducing an isolated nucleic acid into a host cell are well-known 
15 in the art. Other suitable methods for transforming or transfecting host cells may mclude, but 
are not limited to, those foxmd in Sambrook, et al. (Molecular Cloning: A Laboratory Manual. 
3rd ed., Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, N.Y., 2001), and other such laboratory manuals. 

[0160] A nucleic acid encoding a truncated GalNAcT2 polypeptide may be purified by any 
20 suitable means, as are well known in the art. For example, the nucleic acids can be purified 
by reverse phase or ion exchange HPLC, size exclusion chromatography or gel 
electrophoresis. Of course, the skilled artisan will recognize that the method of purification 
will depend in part on the size of the DNA to be purified. 

[0161] The present invention also features a recombinant bacterial host cell comprising , 
25 inter alia, a nucleic acid vector as described elsewhere herein. In one aspect, the recombinant 
cell is transformed with a vector of the present invention. The transformed vector need not be 
integrated into the cell genome nor does it need to be expressed in the cell. However, the 
transformed vector will be capable of being expressed in the cell. In one aspect of the 
invention, E. coU is used for transformation of a vector of the present invention and 
30 expression of protein therefrom. In another aspect of the invention, a K-12 strain of E. coh is 
useful for expression of protein from a vector of the present invention. Strains of E. coli 
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useful in the present invention include, but are not limited to, JM83, JMlOl, JM103, JM109, 
W3 11 0, chil 776, and JA22 1 . 

[0162] It will be understood that a host cell useful in the present invention will be capable 
of growth and culture on a small scale, medium scale, or a large scale. For example, a host 
cell of the invention is useful for testing the expression of a protein firom a vector of the 
invention equally as much as it is useful for large scale production of a reagent or therapeutic 
protein product. Techniques useful in culturing host cells and expressing protem firom a 
vector contained therein are well known in the art and will therefore not be listed herein. 

[01 63] A host cell useful in methods of the present invention, as described above, may be 
prepared according to various methods, as would be understood by the skilled artisan when 
armend with the disclosure set forth herein, hi one aspect, a host cell of the present invention 
may be transformed with a vector of the present invention to produce a transformed host cell 
of the invention. Transformation, as known to the skilled artisan, includes the process of 
inserting a nucleic acid vector into a host cell, such that the host cell containing the nucleic 
acid vector remains viable. Such transformation of nucleic acid into a bacterial cell is useful 
for purposes including, but not Umited to, creation of a stably-transformed host cell, making a 
biological deposit, propagating the vector-containing host cell, propagating the vector- 
containing host cell for the production and isolation of additional vector, expression of target 
protein encoded by vector, and the like. 

[0164] Methods of transforming a cell with a vector are numerous and well-known in the 
art, and will therefore not be listed here. By way of a non-luniting example, a competent 
bacterial cell of tiie invention may be transformed by a vector of the invention using 
electiroporation. Methods of making bacterial cells "competent" are well-known in the art, 
and typically involve preparation of the bacterial cells so that the cells take up exogenous 
DNA. Similarly, methods of electi-oporation are known in the art, and detailed descriptions 
of such methods may be found, for example, in Sambrook et al. (1989, supra). The 
transformation of a competent cell with vector DNA may be also accomplished usmg 
chemical-based methods. One example of a well-known chemical-based method of bacterial 
transformation is described by hioue, et al. (1990, Gene 96:23-28). Other methods of 
transformation will be known to the skilled artisan. 

[01 65] A transformed host cell of the present invention may be used to express a truncated 
Ga]NAcT2 polypeptide of the present invention. In an embodiment of the invention, a 
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transformed host cell contains a vector of the invention, which contains therein a nucleic acid 
sequence encoding an truncated polypeptide of the invention. The truncated polypeptide is 
expressed using any expression method known in the art (for example, IPTG). The expressed 
truncated polypeptide may be contained within the host cell, or it may be secreted from tlae 
host cell into the growth medium. 

[0166] Methods for isolating an expressed polypeptide are well-known in the art, and the 
skilled artisan will know how to determine the best method for isolation of an expressed 
polypeptide based on the characteristics of any given host cell expression system. By way of 
a non-limiting example, an expressed polypeptide that is secreted from a host cell may be 
isolated from the growth medium. Isolation of a polypeptide from a growth medium may 
include removal of bacterial cells and cellular debris. By way of another non-limiting 
example, an expressed polypeptide that is contained within a host cell may be isolated from 
the host cell. Isolation of such an "intracellular" expressed polypeptide may include 
disruption of the host cell and removal of cellular debris from the resultant mixture. These 
methods are not intended to be exclusive representations of the present invention, but rather, 
are merely for the purposes of illustration of various appUcations of the present invention. 

[0167] Purification of a truncated polypeptide expressed in accordance with the present 
invention may be effected by any means known in the art. The skilled artisan will know how 
to determine the best method for the purification of a polypeptide expressed in accordance 
with the present invention. A purification method will be chosen by the skilled artisan based 
on factors such as, but not limited to, the expression host, the contents of the crude extract of 
the polypeptide, the size of the polypeptide, the properties of the polypeptide, the desired end 
product of the polypeptide purification process, and the subsequent use of the end product of 
the polypeptide purification process, 

[01 68] In an embodiment of the invention, isolation or purification of a truncated 
polypeptide expressed in accordance with the present invention may not be desired. In an 

aspect of the present invention, an expressed polypeptide may be stored or transported inside 
the bacterial host cell in which the polypeptide was expressed. In another aspect of the 
invention, an expressed polypeptide may be used in a crude lysate form, which is produced 
by lysis of a host cell in which the polypeptide was expressed. In yet another embodiment of 
the invention, an expressed polypeptide may be partially isolated or partially purified 
according to any of the methods set forth or described herein. The skilled artisan will know 
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when it is not desirable to isolate or purify a polypeptide of the invention, and wUl be famiUar 
with the techniques available for the use and preparation of such polypeptides. 

[0169] When amied with the disclosure set forth herein, the skilled artisan would also 
know how to prepare a eukaryotic host cell of the invention. As set forth elsewhere herein, 
and as would be known to one of skill in the art based on the disclosure provided herein, an 
isolated nucleic acid encoding a truncated GalNAcT2 polypeptide may be introduced into a 
eukaryotic host cell, for example, using a lentivinis-based genomic integration or plasmid- 
based transfection (Sambrook et al.. Third Edition, Molecular Cloning: A Laboratory 
Manual, Cold Spring Harbor Laboratory, New York (2001)). In one embodiment of the 
invention, a eukaryotic host cell is a fungal cell. In another embodiment, a nucleic acid 
encoding a truncated polypeptide of the invention is cloned into a lentiviral vector containing 
a specific promoter sequence for expression of the truncated polypeptide. The truncated 
polypeptide-containing lentiviral vector is then used to transfect a host cell for expression of 
the truncated polypeptide. Methods of making and using lentiviral vectors, such as those 
useful in the present invention, are well-known in the art and are not described further herein. 

[0170] In yet another embodiment, a nucleic acid encoding a truncated polypeptide of the 
invention is introduced into a host cell using a viral expression system. Viral expression 
systems are well-known in the art, and will not be described in detail herem. In one aspect of 
the invention, a viral expression system is a mammalian viral expression system. In another 
aspect of the invention, a viral expression system is abaculovirus expression system. Such 
viral expression systems are typically commercially available firom numerous vendors. 

[0171] The skilled artisan will know how to use a host cell-vector expression system for the 
expression of a truncated polypeptide of the invention. Appropriate cloning and expression 
vectors for use with eukaryotic hosts are described by Sambrook, et al., in Molecular 
Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor, N.Y. (2001), the 
disclosiire of which is hereby incorporated in its entirety by reference. 

[0172] Insect cells can also be used for expression of a truncated polypeptide of the present 
invention. In an aspect of the invention, S£9, SF9^ Sf21, High Five™ or Drosophila 
Schneider S2 cells can be used. In yet another aspect of tiie invention, a baculovirus, or a 
baculovirus/insect cell expression system can be used to express a truncated polypeptide of 
the invention using a pAcGP67, pFastiBac, pMeUBac, or pIZ vector and a polyhedrin, plO, or 
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OpIE3 actin promoter. In another aspect of the invention, a Drosophila expression system 
can be used with a pMT or pAC5 vector and an MT or Ac5 promoter. 

[01731 A truncated GalNAcT2 polypeptide of the invention can also be expressed in 
mammahan cells. Li an aspect of the invention, 294, HeLa, HEK, NSO, Chinese hamster 
ovary (CHO), Jurkat, or COS cells can be used to express a truncated polypeptide of the 
invention. In the case of a mammahan cell expression of a truncated polypeptide, a suitable 
vector such as pT-Rex, pSecTag2, pBudCE4.1, or pCDNA/His Max vector can be used, 
along with, for example, a CMV promoter. As will be understood by the skilled artisan, the 
choice of promoter, as well as methods and strategies for introducing one or more promoters 
mto a host cell used for expressing a truncated GalNAcT2 polypeptide of the invention are 
well-known m the art, and will vary depending upon the host cell and expression system 
used. 

[0174] Various mammalian cell culture systems can be employed to express recombinant 
protein. Non-limiting examples of mammalian expression systems include the COS-7 lines of 
monkey kidney fibroblasts, described by Gluzman, Cell 23:175 (1981), and other cell Imes 
capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and 
BHK cell tmes. Mammahan expression vectors may comprise an origin of replication, a 
suitable promoter and also any necessary ribosome bindmg sites, polyadenylation site, splice 
donor and acceptor sites, transcriptional termination sequences, and 5' flanking 
nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for 
example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites maybe 
used to provide the required nontranscribed genetic elements. 

[0175] The methods available and the methods required to introduce any isolated nucleic 
acid of the invention mto a host cell vary and depend upon the choice of the host cell, as 
would be understoody by one of skill m the art. Suitable methods of introducmg an isolated 
nucleic acid into a host cell are well-known in the art. By way of a non-hmiting example, 
vector DNA can be introduced into a eukaryotic cell using conventional transfection 
techniques. As used herein, the term "transfection" refers to a variety of art-recognized 
techniques for introducing foreign nucleic acid (e.g., DNA) into a host cell, including, 
DEAE-dextran-mediated transfection, lipofection, or electroporation. Suitable methods for 
transforming or transfecting host cells can be found in Sambrook, et al. (Molecular Cloning: 
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A Laboratory Manual. 3nd ed., Cold Spring Harbor Laboratory, Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, N.Y., 2001), and other such laboratory manuals. 

[0176] For example, for stable transfection of mammalian cells, it is known that, depending 
upon the expression vector and transfection technique used, only a small fraction of cells may 
integrate the foreign DNA mto their genome. In order to identify and select these 
transformants, a gene that encodes a selectable marker (e.g., resistance to antibiotics) is 
generally introduced into the host cells along with the gene of interest. Various selectable 
markers include those that confer resistance to drugs, such as G418, hygromycin and 
methotrexate. Nucleic acid encoding a selectable marker can be introduced into a host ceU on 
the same vector as that encoding a truncated polypeptide of the invention or can be 
introduced on a separate vector. Cells stably transfected with the introduced nucleic acid can 
-be identified by drug selection (e.g., cells that have incorporated the selectable marker gene 
will survive, while the other cells die). 

TTT. Polypeptides 

[0177] A truncated GalNAcT2 polypeptide of the present invention may be truncated m 
various ways, as would be known and understood by the skilled artisan, when armed with the 
present disclosure. Examples of truncated polypeptides of the present invention include, but 
are not limited to, a polypeptide lackmg a single N-terminal residue, a polypeptide lacking a 
single C-tenninal residue, a polypeptide lacking both an smgle N-terminal residue and a 
single C-terminal residue, a polypeptide lacking a contiguous sequence of residues from the 
N-terminus, a polypeptide lacking a contiguous sequence of residues from the C-terminus, 
and any such combinations thereof. 

[0178] As would be understood by the skilled artisan, a full-length human GalNAcT2 
polypeptide may contain one or more identifyable polypeptide domains in addition to the 
"active domain," the domain primarily responsible for the catalytic activity, of GalNAcT2. 
This is because it is known in that art that a full-length GalNAcT2 polypeptide, and in 
particular, a full-length human GalNAcT2 polypeptide, contains a signal domain, a 
transmembrane domain, and a stem domain, in addition to an active domain. Accordingly, a 
full-length human GalNAcT2 may have a signal domain at the amino-terminus of the 
polypeptide, followed by a transmembrane domain immediately adjacent to the signal 
domain, followed by a stem domain that is immediately adjacent to the transmembrane 



35 



wo 2005/121331 



PCT/US2005/019442 



domain, followed by an active domain that extends to the carboxy-terminus of the 
polypeptide and is located immediately adjacent to the stem domain. 

[0179] Therefore, in one embodiment, a GalNAcT2 polypeptide of the invention is a 
truncated human GalNAcT2 polypeptide lacking all or a portion of the GaINAcT2 signal 
domain. In another embodiment, a GaUSTAcTl polypeptide of the invention is a truncated 
human GalNAcT2 polypeptide lacking the Ga]NAcT2 signal domain and all or a portion of 
the GaINAcT2 transmembrane domain. In yet another embodiment, a GalNAcT2 
polypeptide of the invention is a truncated human GalNAcT2 polypeptide lacking the 
GalNAcT2 signal domain, the GalNAcT2 transmembrane domain and all or a portion the 
GalNAcT2 stem domain. When armed with the disclosure of the present invention, the 
skilled artisan will know how to make and use these and other such truncation mutants of 
human GalNAcT2. 

[0180] The size and identity of a truncated GalNAcT2 mutant of the present invention is 
based on the poiat at which the full-length polypeptide is truncated. By way of a non- 
limitmg example, a "A40 human truncated GalNAcT2" mutant of the mvention refers to a 
truncated GalNAcT2 polypeptide of the invention in which amino acids 1 through 40, 
counting from the N-terminus of the full-length polypeptide, are deleted from the 
polypeptide. Therefore, the N-terminus of the A40 human truncated GalNAcT2 mutant 
begins with the amino acid residue that would be referred to as "amino acid 41" of the full- 
length polypeptide. This nomenclature appUes to all truncated GalNAcT2 polypeptides of 
the invention, including human GalNAcT2. 

[0181] The present invention therefore also includes an isolated polypeptide comprising a 
truncated GalNAcT2 polypeptide. Preferably, an isolated truncated GalNAcTZ polypeptide 
of the present invention has at least about 90% identity to a polypeptide having the amino 
acid sequence of any one of the sequences set forth in SEQ ID NO:4, SEQ ID NO:8 or SEQ 
ID NO: 10. More preferably, the isolated polypeptide is about 95% identical, and even more 
preferably, about 98% identical, still more preferably, about 99% identical, and most 
preferably, the isolated polypeptide comprising a truncated GalNAcT2 polypeptide is 
identical to the polypeptide set forth in one of SEQ ID NO:4, SEQ ID NO: 8 or SEQ ID 
NO:10. 

[01 82] The present invention also provides for analogs of polypeptides which comprise a 
truncated GalNAcT2 polypeptide as disclosed herein. Analogs can differ from naturally 
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occurring proteins or peptides by conservative amino acid sequence differences or by 
modifications which do not affect sequence, or by both. 

[0183] For example, conservative amino acid changes may be made, which although they 
alter the primary sequence of the protein or peptide, do not normally alter its function. 
5 Conservative amino acid substitutions typically include substitutions within the following 
groups: 

glycine, alanine; 
valine, isoleucine, leucine; 
aspartic acid, glutamic acid; 
10 asparagine, glutamine; 

serine, threonine; 
lysine, arginine; 
phenylalanine, tyrosine. 

1 5 [0184] Modifications (which do not normally alter primary sequence) include in vivo, or in 
vitro chemical derivatization of polypeptides, e.g., acetylation, or carboxylation. Also 
included are modifications of glycosylation, e.g., those made by modifying the glycosylation 
pattems of a polypeptide during its synthesis and processing or in fiirther processing steps; 
e.g., by exposing the polypeptide to enzymes which affect glycosylation, e.g., marmnalian 

20 glycosylating or deglycosylating enzymes. Also embraced are sequences which have 
phosphoxylated amino acid residues, e.g., phosphotyrosine, phosphoserine, or 
phosphothreonine. 

[0185] Also included are polypeptides which have been modified using ordinary molecular 
biological techniques so as to improve their resistance to proteolytic degradation or to 
25 optimize solubility properties or to render them more suitable as a therapeutic agent. Analogs 
of such polypeptides include those containing residues other than naturally occurring L- 
amino acids, e.g., D-amino acids or non-naturally occurring synthetic amino acids. The 
peptides of the invention are not limited to products of any of the specific exemplary 
processes listed herein. 

30 [0186] Fragments of a truncated GalNAcT2 polypeptide of the invention axe included in 
the present invention, provided the fragment possesses the biological activity of the fiiU- 
length polypeptide. That is, a truncated GalNAcT2 polypeptide of the present invention cau 
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catalyze the same glycosyltransfer reaction as the full-length GalNAcT2. By way of a non- 
limiting example, a truncated human GalNAcT2 polypeptide has the ability to transfer a 
GalNAc moiety from a UDP-GalNAc donor to a granulocyte-colony stimulating factor (G- 
CSF) acceptor, wherein such a transfer results in the O-linked covalent coupling of a GalNAc 
5 moiety to a threonine residue of G-CSF. Therefore, a smaller than full-length, or "truncated," 
GalNAcT2 is included in the present invention provided that the truncated GalNAcT2 has 
GalNAcT2 biological activity. 

[0187] hi another aspect of the present invention, compositions comprising an isolated 
truncated GalNAcT2 polypeptide as described herein may include highly purified trmicated 

10 GalNAcT2 polypeptides. Alternatively, compositions comprising truncated GalNAcT2 

polypeptides may include cell lysates prepared from the cells used to express the particular 
truncated GalNAcT2 polypeptides. Further, truncated GalNAcT2 polypeptides of the present 
invention may be expressed in one of any number of cells suitable for expression of 
polypeptides, such cells being well-known to one of skill in the art, as described in detail 

15 elsewhere herein. 

[0188] Substantially pure protein isolated and obtained as described herein may be purified 
by following known procedures for protein purification, wherein an immimological, 
enzymatic or other assay is used to monitor purification at each stage in the procedure. 
Protein purification methods are well known in the art, and are described, for example in 
20 Deutscher et aL (ed., 1990, Guide to Protein Purification , Harcourt Brace Jovanovich, San 
Diego). 

IV. Methods 

[0189] The present invention features a method of expressing a truncated polypeptide. 
Polypeptides which can be expressed according to the methods of the present invention 

25 include a truncated GalNAcT2 polypeptide. More preferably, polypeptides which can be 

expressed according to the methods of the present invention include, but are not limited to, a 
truncated human GalNAcT2 polypeptide. Ih a preferred embodiment, a polypeptide which 
can be expressed according to the methods of the present invention is a polypeptide 
comprising any one of the polypeptide sequences set forth in SEQ ID NO:4, SEQ ID NO:8 or 

30 SEQ ID NO: 10. 

[0190] In one embodiment, the present invention features a method of expressing a 
truncated GaIN"AcT2 polypeptide encoded by an isolated nucleic acid of the invention, as 
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described elsewhere herein, wherein the expressed truncated Ga]NAcT2 polypeptide has the 
property of catalyzing the transfer of a GalNAc moiety to an acceptor moiety. In one aspect 
of the invention, a method of expressing a truncated GalNAcT2 polypeptide includes the 
steps of cloning an isolated nucleic acid of the invention into an expression vector, inserting 
the expression vector construct into a host cell, and expressmg a truncated GalNAcT2 
polypeptide therefrom. 

[0191] Methods of e3q)ression of polypeptides, as well as construction of expression 
systems and recombinant host cells for expression of polypeptides, are discussed in extensive 
detail elsewhere herein. Methods of expression of a truncated polypeptide of the present 
invention will be understood to include, but not to be limited to, all such methods as 
described herein. In some expression systems, the truncated GalNAcT2 polypeptides of the 
invention are expressed as insoluble proteins, e.g., in an inclusion protein in a bacterial host 
cell. Methods of refolding insoluble glycosyltransferases, including GalNAcT2 polypeptides, 
are disclosed m U.S. Provisional Patent AppKcation Serial No. 60/542,210, filed February 4, 
2004; U.S. Provisional Patent AppUcation Serial No. 60/599,406, filed August 6, 2004; U.S. 
Provisional Patent Application Serial No. 60/627,406, filed November 12, 2004; and 
hitemational Patent AppUcation No. PCT/US05/03856, filed February 4, 2005; each of which 
are herein incorporated by reference for all pvirposes. 

[0192] The present invention also features a method of catalyzing the transfer of a GalNAc 
moiety to a GalNAc acceptor moiety, wherein the GalNAc-transfer reaction is carried out by 
incubating a truncated GalNAcT2 polypeptide of the invention with a GalNAc donor moiety 
and a GalNAc acceptor moiety. In one aspect, a truncated GalNAcT2 polypeptide of the 
invention mediates the covalent linkage of a GalNAc moiety to a GalNAc acceptor moiety, 
thereby catalyzing the transfer of a GalNAc moiety to an acceptor moiety. 

[0193] In one embodiment of the invention, a truncated GalNAcT2 polypeptide usefiil m a 
glycosyltransfer reaction is a truncated human GalNAcT2 polypeptide. In one aspect, the 
human GalNAcT2 glycosyltransfer reaction involves the transfer of a GalNAc residue fi-om a 
GalNAc donor to a GalNAc acceptor. 

[0194] By way of a non-limiting example, a method of catalyzing the transfer of a GalNAc 
moiety to an acceptor moiety includes the steps of incubating a truncated GalNAcT2 
polypeptide with UDP-GalNAc GalNAc donor and a granulocyte colony stimulating factor 



39 



wo 2005/121331 



PCT/US2005/019442 



(G-CSF) acceptor moiety, wherein the truncated GalNAcT2 polypeptide mediates the transfer 
of GalNAc from the UDP-GalNAc donor to the GCSF acceptor. 

[0195] Therefore, in one embodiment, the present invention also features a polypeptide 
acceptor moiety. Li one embodiment of the invention, a polypeptide acceptor moiety is a 
5 hmnan growth hormone. In another embodiment, a polypeptide acceptor moiety is an 

erythropoietin, hi yet another embodiment, a polypeptide acceptor moiety is an interferon- 
alpha. In anolher embodiment, a polypeptide acceptor moiety is an interferon-beta. La 
another embodiment of the invention, a polypeptide acceptor moiety is an interferon-gamma. 
In still another embodiment of the invention, a polypeptide acceptor moiety is a lysosomal 
10 hydrolase. In another embodiment, a polypeptide acceptor moiety is a blood factor 

polypeptide. In still another embodiment, a polypeptide acceptor moiety is an anti-tumor 
necrosis factor-alpha. In another embodiment of the invention, a polypeptide acceptor moiety 
is follicle sthnulating hormone. 

[0196] In one embodiment, the present invention also featm-es a method of transferring a 

15 GaDSfAc-polyethyleneglycol conjugate to an acceptor molecule. In one aspect, an acceptor 
molecule is a polypeptide. In another aspect, an acceptor molecule is a glycopeptide. 
Compositions and methods useful for designing, producing and transferring a GalNAc- 
polyethyleneglycol conjugate to an acceptor molecule are discussed at length in Intemational 
(PCT) Patent Application No. WO03/031464 (PCT/US02/32263) and U.S. Patent 

20 Apphcation No. 2004/006391 1 , each of which is incorporated herein by reference in its 
entirety. Methods of assaying for glycosyltransferase activity are well-known in the art. 
Various assays for detecting glycosyltransferases which can be used in accordance with the 
invention have been published. The following are illustrative, but should not be considered 
limiting, of those assays useful for detecting glycosyltraasferase activity. Furukawa et al 

25 (1985, Biochem. J., 227:573-582) describe a borate-impregnated paper electrophoresis assay 
and a fluorescence assay. Roth et al (1983, Expl Cell Research 143 :217-225) describe 
application of the borate assay to glucuronyl transferases, previously assayed 
calorimetrically. Benau et al (1990, J. Histochem, Cytochem,, 38:23-30) describe a 
histochemical assay based on the reduction, by NADH, of diazonium salts. See also U.S. 

30 Patent No. 6,284,493 of Roth, incorporated herein by reference. 
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EXPERIMENTAL EXAMPLES 
[0197] The invention is now described with reference to the following examples. These 
examples are provided for the purpose of illustration only and the invention should in no way 
be construed as being limited to these examples but rather should be construed to encompass 
5 any and all variations which become evident as a result of the teaching provided herein. 

Example 1 : Cloning. Expression, and Refolding of Human P olypeptide N- 
acetvlgalactosaminyltransferase II (GalNAcT2) in^. co/rJM109 

[0198] Four constructs were designed and created in order to assess the sialyltransferase 
10 activity of truncation mutants of human GalNAcT2. The four mutants created included A40, 
a truncation mutant which has as its new N-tenninal residue an lysine that corresponds to 
R41 of the full-length human GalNAcT2 set fortli in SEQ ID NO:2, A51, a truncation mutant 
which has as its new N-terminal residue an lysine that corresponds to K52 of the full-length 
human GalNAcT2 set forth in SEQ ID NO:2, A73, a truncation mutant which has as its new 
15 N-terminal residue a glycine that corresponds to G74 of the full-length hviman GalNAcT2 set 
forth in SEQ ID NO:2, and A94, a truncation mutant which has as its new N-terminal residue 
a glycine that corresponds to G95 of the full-length human GalNAcT2 set forth in SEQ ID 
NO:2. 

[0199] Truncated human polypeptide N-acetylgalactosaminyltransferase II (GalNAcT2) 
20 was expressed as maltose binding protein (MBP)-fusion proteins in inclusion bodies from E, 
coli JM109 cells. The production of active enzyme was examined by refolding and assaying 
against two polypeptide acceptors. Therefore, described herein is the generation of several 
truncated forms of human polypeptide GalNAcT2 as maltose binding protein fusion protems 
in E,coli JM109 cells. The recombinant proteins are refolded from isolated inclusion bodies 
25 using the Hampton Foldit screen kit (Hampton Research, AUso Vieja, CA). All four 

constructs were expressed in JM109 E.coli at levels of approximately 2g/L culture media. 

[0200] PGR (Polymerase Chain Reaction) amplifications were performed in a final reaction 
volume of 50 \x\ containing 5 |li1 of template DNA (11 iiig/ml, 100-fold diluted pBKS-FuU 
ppGalNAcT2), 40 pmol of 5'- primer and 3'- primer, 10 nmol of dNTP mixture, and 5 units 
30 of Herculase™ Enhanced DNA Polymerase under the conditions of 3 1 cycles of denaturation 
at 95°C for 45 seconds, annealing at 62°C for 45 seconds, and extension at 74°C for 170 
seconds. PGR products were subjected to 1% agarose gel electrophoresis. DNA fragments 
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were excised and purified by QIAEX E gel extraction kit (Qiagen, Valencia, CA). Table 1 
illustrates the primers used in the PGR reactions. 

Table 1: Primers used in cloning ppGalNAcT2 

Sense Primers: 

For N41R (relates to A40): 

5' CGCGGAICCAGGAAGGAGGACTGGAATG 3' (SEQ ID NO:ll) 

BamHI 
For N52K (relates to A51): 

5' CGC GGATCCA AAAAGAAAGACCTTCATCACAGC 3' (SEQ ID NO:12) 

BamHI 
For N74G (relates to A73): 

5' CGC GGATCC GGGAAAGTACGGTGGCCAGAC 3' (SEQ ID NO:13) 

BamHI 

ForN95G (relates to A94): 

5' CGC GGATCC GGGCAGGACCCTTACGCC 3' (SEQ ID NO: 14) 

BamHI 

Antisense Primer with STOP codon: 

5 '-CT GCTCGAGCTAC TGCTGCAGGTTGAGCG 3 ' (SEQ ID NO: 1 5) 
Xhol Stop 

[0201] Gel-purified PGR products were digested with BamHI and Xhol, gel purified again 
5 and ligated into a pGWin2MBP vector previously digested by the same restriction enzymes. 
The ligated products were transformed into E. coli DH5a electrocompetent cells. The 
transformants were plated on LB Agar plates with 50 |J,g/ml Kanamycin and incubated at 
37°G overaight. Three colonies were picked for each construct and cultured ia LB medium 
containing 15 ng/ml kanamycin. Plasmid DNAs were purified by QIAprep Spin Miniprep 
10 Kit (Qiagen, Valencia, C A) and screened by restriction mapping with BamHI and Xhol. The 
plasmids having the correct digest patterns were transformed into JM109 chemical competent 
cell. 

[0202] JM109 cells were cultured in a 15 ml culture tube containing 6 ml LB medium and 
15 H-g/ml of kanamycin overnight at 37°G with rapid shaking (250 rpm). For each culture, 
15 two milliliters of starting culture was transferred to a 50 ml centrifiige tube containing 23 ml 
LB medium with 15 ng/ml kanamycin and incubated at 37°C with rapid shalcing for 3 hours. 
Isopropyl-l-thio-(3-D-galactopyranoside (IPTG) was added to a final concentration of 0.4 
mM to induce the protein expression. After shaking at 37°G (250 rpm) for another 3 hours, 
cells were harvested by centrifugation at 3,500 x g for 10 minutes. The cell pellets were then 
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resuspended in 0.6 ml of 20 mM Tris-HCl buffer (pH 8.5) containing 1% Triton X-100. 
Lysozyme (100 |ig) and DNasel (2 jitg) were then added. The mixture was shaken at 37°C in 
an incubator shaker for 45 minutes before being transferred to a 1.5 ml microcentrifuge tube. 
Lysate was separated from inclusion bodies (IB) by centrifiigation at 14,000 rpm for 5 
5 minutes. 

[0203] Each sample for SDS-PAGE separation was prepared by mixing 5 of whole cells 
suspension, lysate, or inclusion bodies suspension with 5 |li1 of 2 x Tris-Glycine SDS sample 
buffer and LI |al of DTT (1 M). The mixture was heated at 98°C for 5 minutes, cooled to 
room temperature, and loaded to each well of a 1.0 mm x 15 well 4-20% Tris-Glycine 
10 gradient gel. The electrophoresis was conducted at 120 V for 100 minutes. The gel was then 
stained for 2 hours and de-stained with distilled water (see Figures 4-6). 

[0204] Inclusion bodies were dissolved at 20 mg/ml (high protein concentration) or 2 
mg/ml concentration (low protein concentration) in solubilizing buffer contaunng 4 M 
Guanidine-HCl, 100 mM Tris-HCl, pH 9.0, 5 mM EDTA, and 10 mM DTT. Refolding of 
1 5 inclusion bodies by Hampton Foldit Screen Kit was carried out by following the 

manufacturer's protocol, except that a 10-fold less volume was used (100 |al -scale) 
(Hampton Products, Aliso Viejo, CA). 

[0205] Non-radioactive enzyme activity assays for lysates were carried out in a 0.5 |al 
microcentrifuge tube at 37°C for overnight in a final volume of 10 lal containing 50 mM MES 

20 buffer, pH 6.0, MaCh (15 mM), MgClz (15 mM), NaCl (0.15 M), UDP-GalNAc (5 mM), 1.5 
|j.g G-CSF (acceptor), and 2.15 |j,l of lysate sample. Enzyme was substituted by H2O as a 
negative control. Purified recombinant ppGalNAcT2 (0.5 fxl) from Sf9 baculovirus 
expression system was used as the positive control. An assay for refolded inclusion bodies 
was performed in a similar manner as described for the lysates, except that Interferon a2b (4 

25 iLig) was used as the acceptor for the enzyme and the volume of the sample added to the 
reaction mixture was 5.65 \xh 

[0206] DNA fragments for ppGalNAcT2 genes (about 1.5 kb) were successftilly amphfied 
by PGR as shown in Figure 5. Vector plasmid DNA pCWin2MBP was digested by BamHI 
and Xhol, and purified on a 1% agarose gel. The gel purified DNA fragment was digested by 
30 the same two enzymes and purified. After digestion, the DNA fragments were clean as 
visualized on an agarose gel (Figure 2B). 
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[0207] BamHI and Xhol digestion of the plasmids purified from the selected twelve 
colonies showed predicted correct pattern on a 1% agarose gel. The size of the vector was 
around 6.2 kb, and the inserts were approximately 1.5 kb. Maltose-binding protein (MBP) 
expressed in the JM109 transformed with pCWin2MBP vector plasmid showed a band at 
around 43 kDa. Over 90% of the proteins in the whole cells are MBP. The #2 colony of the 
construct N41R expressed a shorter protein than expected, indicating the occurrence of 
mutation. All other eleven colonies showed a band at about 100 kDa for MBP-ppGarNAcT2 
fusion proteins, with over 80% of the total proteins were the target fusion proteins. 

[0208] Gel electrophoresis showed that most of the MBP was expressed as a soluble form 
in cell lysate (Figures 1 and 2). The overexpressed protein in the wrong construct (colony 
#2) for N41R was also observed in the cell lysate. However, most of the MBP-ppGalNAcT2 
fusion proteins were in inclusion bodies (Lane 1 and 3 - 12 in Figures 1 and 2). Over 90% of 
the proteins in the inclusion bodies were the MBP fusion proteins of interest. 

[0209] In summary, four truncated forms of human polypeptide Ga]NAcT2 were 
successfully cloned into pCWin2MBP vector and expressed in E. coli JM109 as MBP fusion 
proteins in inclusion bodies. The level of expression of enzyme in inclusion bodies was 
about 2 g/L. As estimated from the SDS-PAGE, over 80% of the inclusion bodies were the 
target MBP-ppGalNAcT2 fusion proteins. 

Example 2: Development of Protein Refolding Conditions for E. Co li Expressed MBP- 
Human GalNAcT2 

[0210] Refolding experiments on MBP-GalNAcT2 were carried out on a 1 ml scale, with 
four different MBP-GalNAcT2 DNA constructs and under 16 different possible refolding 
conditions. Refolding was performed using the Hampton Research Foldit kit (Hampton 
Research, AUso Viejo, CA) and the assays were performed via radioactive detection of [^H] 
UDP-GalNAc addition to a MuC-2 peptide and via matrix-assisted laser desorption ionization 
mass spectrometry (MALDI) analysis utilizing addition of GalNAc to Interferon a-2b and G- 
CSF. The data illustrates that E.coli-expressed MBP-GalNAcT2 can be refolded into an 
active enzyme. It appears that under refolding conditions 8 and 15, found in Hampton 
Research's Foldit kit (Hampton Research, Ahso Viejo, CA), active conformations of MBP- 
GaINAcT2, construct 1 and 2, were identified. Success was indicated by the [^H] UDP- 
GalNAc assay and later confumed by interferon a-2b (IFa-2b) and granulocyte-colony 
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Stimulating factor (G-CSF) -based glycosyltransferase assays. The specific methods and data 
of this study are presented herein. 

[0211] As described elsewhere herein, GalNAcT2 constructs used in the present invention 
comprised DNA encoding various amino terminal amino acid truncation mutants of the 
5 original human GalNAcT2 protein, including the following constructs, which begin with the 
N-terminal amino acid as indicated: 

Construct 1 - pCWin2 MBP-GalNAcT2 - R41 Arginine (924aa, 103682.5MW), 

Construct 2 - pCWin2 MBP-GalNAcT2 - K52 Lysine (913aa, 102286.0MW), 
Construct 3 - pCWin2 MBP-GalNAcT2 - G74 Glycine (891aa, 99799.3MW), and 
10 Construct 4 - pCWin2 MBP-GalNAcT2 - G95 Glycine (870aa, 97419.8MW) . 

[0212] Constructs were first expanded to 2 ml starter cultures by inoculating 2 ml of 

Martone L-Broth containing lOiig/ml Kanamycin sulfate with a pipette tip scraping from the 

particular glycerol stock culture. This procedure was performed on all four constructs for a 

total of four starter cultures. Starter cultures were incubated overnight at 37°C, with rotary 

1 5 shaking at 250rpm. From the overnight cultures, four 275 ml Martone L-Broth cultures 

containing 10|j.g/ml Kanamycin sulfate were prepared. Each of these cultures was inoculated 

with 275p,L of one of the 2 ml starter cultures of constructs 1 through 4. These 275 ml 

cultures were incubated overnight at 37°C, with shaJdng at 250rpm. 

[0213] Lastly, four IL Martone L-Broth cultures containing 10|Lig/ml Kanamycin sulfate 
20 were prepared. Each of these cultures was inoculated with 40 ml of one of the 275 ml 

cultures of constructs 1 though 4. These IL cultures were incubated at 37°C, with shaking at 
250rpm, until the OD600 measured approximately 1.0. Upon reaching this point, IPTG was 
added to each of the four IL cultures to a final concentration of 0.4mM. Cultures were then 
allowed incubate overnight at 37°C, with shakmg at 250rpm. 

25 [0214] One-liter cultures containing JM109 pCWin2 MBP-GalNAcT2 constructs, 

designated numbers 1 through 4, were transferred to IL centrifuge bottles. Cultures were then 
centrifuged at 5000rpm for 30 minutes at 4°C. Supematants were removed aiid the pellets 
were weighed. The pellets from each sample were then washed to isolate the inclusion bodies 
(IBs). The pellet of each construct was first resuspended in 15 ml of 20mM Tris-HCl 

30 pH=8.5, 5mM EDTA and then lysed by two passages through a french press at 12,000psi. 

[0215] The lysates for each construct were then centrifuged at 5000rpm, 25°C for 5 
minutes in 50 ml disposable tubes. The supematants were removed and the pellets were 
resuspended in 25 ml of 20mM Tris-HCl pH=8.5, 1% Triton X-100. The suspensions 
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incubated at room temperature for 10 minutes. The suspensions were then centrifuged at 
5000 rpm, 25°C for 5 minutes. The supematants were then removed and the samples were 
resuspended for a second time in 25 ml of 20mM Tris-HCl pH=8.5, 1% Triton X-100 and 
allowed to incubate at room temperature for 10 minutes. The suspensions were again 
5 centrifuged at 5000 rpm, 25°C for 5 minutes. The supematants were removed and a third 
wash was performed by resuspending the pellets in 25 ml of 20mM Tris-HCl pH=8.5, 1% 
Triton X-100. The suspensions sat at room temperature for 10 minutes and then were 
centrifuged at 5000 rpm, 25°C for 5 minutes. The supematants from each sample were 
removed and the pellets were weighed. The pellets were then diluted to 20mg/ml by 
10 resuspending them in the appropriate volume of 20inM Tris-HCl pH=8.5, 5mM EDTA. One- 
mi aliquots were made from these suspensions for each of the four constmcts and stored at - 
20''C. These aliquots represent the triple washed IBs or "TWEBs." 

[0216] SolubiUzation buffer was prepared with the following constituents: 6M Guanidine 
HCl, 5mM EDTA, 50mM Tris-HCl pH=8 and lOmM DTT. 1 ml of this solution was added 
15 to a 20mg aliquot of TWIBs to yield a 20mg/ml solution. The solution was incubated 
ovemight on the bench top to solubiHze IBs. This procedure was performed on a TWIB 
aliquot of each MBP-GaINAcT2 construct to provide protein for refolding experiments. 

[0217] To screen refolding conditions that may result in an active form of E.coli expressed 
MBP-GalNAcT2, a Hampton Foldit Screening kit was utilized (Hampton Products, Aliso 
20 Viejo, CA). The composition of each of the refold buffers is found in Table 2. 

Table 2: Refold Conditions from Hampton Research Foldit kit (Hampton Research. Aliso 
Vieio. CA^ 
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[0218] For a given refold condition, 950|liL of refold buffer was combined with 50iliL of 
solubilized protein (for high protein concentration conditions) or 995 iiL of refold buffer was 
combined with 5}iL of solubilized protein (for low protein concentration conditions). 
5 Refolding reactions were placed on a rotary shaker in the cold room (4°C) overnight. 

[0219] From results obtained in the screen, it was determined that refold conditions 3, 8, 
11, 12, 15 and 16 yielded the most promising results for constructs 1 and 2. Additional 
refolding reactions were performed with under those conditions using G-50 gel filtration 
instead of dialysis to yield more concentrated protein refold samples (See Refold Purification 
10 section for methods). From those experiments, further refinement was achieved and 

conditions 8 and 1 1 were found to be optimal. More specifically, condition 15 was optimal in 
an overnight incubation rotating and condition 8 was found to be optimal remaining still in a 
5 day incubation. 

[0220] Protein refold samples were first purified by dialysis against 20mM Tris-HCl, 
15 pH=8.5. lOOp-Lof each refold sample was dialyzed. Dialysis was conducted in a beaker 
containing 20mM Tris-HCl pH==8.5 with slow stirring. Samples were placed at 4°C and 
allowed to dialyze ovemight. Resulting retentate was used in a radioactive activity assay, as 
discussed elsewhere herein. As an alternative method to yield more concentrated protein 
samples, MBP-GalNAcT2 refold samples were purified by use of G-50 Macro Spin Columns 
20 (Harvard Bioscience, HoUiston, MA). Caps were removed from the G-50 columns and 
colimms were placed into 2 ml microcentrifuge tubes. H2O (500 |al) was added to each 
column and the columns were allowed to incubate for 15 minutes to hydrate. The colimms 
were then centrifiaged at -2000 x g for 4 minutes after which they were transferred to new 2 
ml centrifuge tubes. Each refold solution (1 50|li1) was applied to one of the columns. 
25 Columns were then centrifuged at 2000 x g for -2 minutes. Resulting permeates represented 
the purified refold samples. 

[0221] A radiolabeled [^H]-UDP-GalNAc assay was performed to determine the activity of 
the E.coli-expressed refolded MBP-GalNAcT2 by monitoring the addition of radiolabeled 
GalNAc to a peptide acceptor. The acceptor was a MuC-2 - like peptide having the sequence 
30 MVTPTPTPTC (SEQ ID NO: 16). The peptide was dissolved in IM Tris-HCl pH=8.0. The 
initial screen was performed on refolded protein samples which had been purified by dialysis. 
Subsequent refold samples were freshly refolded and purified by G-50 gel filtration. The 
assay included protein refold samples, GalNAcT2 from Baculovirus as a positive control, a 
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negative control sample with aU the components except enzyme and a maximum input 
sample which contained all components except enzyme. A total of 19 samples were tested. 
The assay solution consisted of liie components Usted in Table 3: 



Table 3: GaINAcT2 assay reaction composition. 



Component 

0.25M Tris-HCl 
2.5% Triton X-100 

lOOmM MnCl2 
[H^] UDP-GalNAc 

O.lmCi/ml 
ImM UDP-GalNAc 
lOmM MuC2 Peptide 
Enzyme 



Dilution 

N/A 
N/A 
N/A 

0.5|J,1 in 4.5^1 

N/A 
0.51x1 in 4.5M.1 



Volume Oil) 

5 
5 
5 



5 
5 

20 



Final Concentration 

25mM 
0.25% 
lOmM 

50nCi 

O.lmM 
O.lmM 



[0222] For each of the refold samples, 30|aL of the reaction mixture were combined with 
20nL of the refold sample. For the negative control, 20nL H2O was combined with 30nL of 
the reaction mixture. For the positive control, 1 \iL of GarNAcT2 Baculovirus enzyme was 

10 added in addition to 19hL of H2O to form a 30jiL reaction mixture. For the "maximum 

input" sample, 30\iL of the reaction mixture was combined with 20|j,L of dHaO. Reactions 
were incubated at 37°C for 30 minutes. 100 ml DOWEX AG 1X8 (chloride form) was 
washed by combimng 100 ml of resin and 100 ml of H2O and mixing well. The water was 
poured off the resin and another 100 ml of H2O was added, mixed and removed. The resin 

1 5 was resuspended one final time in 1 00 ml of dH20. After the GalNAcT2 assay reaction had 
incubated for 30 minutes, 1 ml of resuspended resin in H2O was added to each reaction 
(except for the maximum input sample). Samples were vortexed briefly and then loaded into 
filter columns and allowed to drain by gravity into scintillation vials. 5 ml of scintillation 
solution was added to each of the samples and standards. Samples were shaken briefly and 

20 loaded on the scintillation counter and radioactivity measured. 

[0223] An IFa-2b assay was performed to determine whether E.coli-expressed refolded 
MBP-GalNAcT2 could transfer GalNAc to an interferon a-2b acceptor from a UDP-GalNAc 
donor. From data obtained in the refold screen (see the [^H]UDP-GaINAcT2 assay 
description elsewhere herein), it was shown that MBP-GalNAc constructs 1 and 2 in refold 
25 buffers 8 and 15 yielded the most active enzymes, as determined by the radioactive assay. 

Therefore, in the IFa-2b assay, constructs 1 and 2 in refold buffers 8 and 15 were assayed for 
transferase activity. Additionally, as a positive control, GaINAcT2 from a Baculovirus 
system was assayed as well. 
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[0224] The assay consisted of reaction buffer (27mM MES, pH=7, 200mM NaCl, 20niM 
MgC12, 20mM MnC12, and 0.1% Tween 80), IFa-2b Protein (2mg/ml in SOmM MES pH=6, 
150mM NaCl, 0.05% Tween 80, 0.05% NaNs), and lOOmM UDP-GalNAc. The assay 
solution was prepared as shown in Table 4 for each reaction. 

Table 4: Parameters for IFa-2b acceptor Ga]NAcT2 activity assay 



la 

I 



Reactioii Components 

MES, pH=7 
NaCl 
MgCl2 
MnCl2 
Tween 80 
2nig/ml IFa-2b Protein 
lOOmM UDP-GalNAc 



Reaction Component 
Volumes 



5|j,l from Rxn Buffer 
(additional concentration 
from IFa-2b dilution 
buffer) 



lOfil 
0.6^1 



Final Concentration 

20mM 
ISOmM 

5mM 

5niM 

0.05% 
Img/ml 
3mM 



[0225] For each refold sample, 4.4nL of sample were added to 1 5 )j,L of reaction solution. 
1 0 For the positive control, 1 nL of standard GalNAcT2 Baculoyirus was added along with 
3.4|xL of H2O to one tube. Reactions were incubated at 32°C on a rotary shaker for several 
days, during which time an overnight time point and a 5 day time point were assayed by 
MALDI. 

[0226] The above assay was performed to determine whether E.coli-expressed refolded 
15 MBP-GalNAcT2 covHd transfer GalNAc to G-CSF acceptor from a UDP-GalNAc donor. As 
above, construct 2 in refold buffer 8 was assayed for GarNAcT2 activity. Additionally, as a 
positive control, GaIISrAcT2 from Baculovirus was assayed. The assay consisted of reaction 
buffer (27mM MES, pH=7, 200mM NaCl, 20mM MgC12, 20mM MnC12, and 0.1% Tween 
80), G-CSF Protein (2mg/ml in H2O), and lOOmM UDP-GalNAc. The assay solution was 
20 prepared for each reaction as shown in Table 5. 
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Table 5: Parameters for G-CSF acceptor GalNAcT2 activity assay 



is 

I 



Reaction Components 

MES, pH=7 
NaCl 
MgCl2 
MnCl2 
Tween 80 
2mg/ml G-CSF 
lOOmM UDP-GalNAc 



Reaction Component 
Volumes 



5^xl of Rxn Buffer 



lOjxl 
0.6nl 



Final Concentration 

20mM 
150mM 
SmM 

5mM 
0.05% 
Img/ml 

SmM 



5 [0227] For the refold sample, 4. 4(aL of sample were added to 1 5 nL of reaction solution. 
For the positive control, 1 |j,L of standard GalNAcT2 Baculovirus was added along with 
3.4nL of H2O to one tube. Reactions were incubated at 32°C on a rotary shaker for 4 days, at 
the end of which a sample was taken and assayed by MALDI. 

[0228] Pellet weights and inclusion body weight were determined for each of the four IL 
10 JM109 pCWin2 MBP-GalNAcT2 constructs 1 through 4 cultures, as shown in Table 22. 

Table 6: Cell pellet weights versus inclusion body weights 



Pellet and Inclusion Body Weights from IL JM109 pCWin2 MBP-GaINAcT2 

Cultures 


JM109 pCWin2 MBP- 
GalNAcT2 Construct 


Cell Pellet Weight (g) 


Inclusion Body Weight (g) 


1 


5.04 


2.04 


2 


5.24 


2,19 


3 


4.89 


2.42 


4 


4.30 


2.44 



[0229] The expression of MBP-GalNAcT2 was observed by way of the SDS-Page gel 
analysis of JM109 pCWin2 MBP-GalNAcT2 whole cell samples before and after induction 
15 by IPTG (Figure 7). The protein gel shows a clear increase in protein expression in the 
induced state compared to the uninduced state. Furthermore there is a distinct band at 
-^lOOkDa that substantially increases after induction which correlates to the expected size of 
the MBP-GalNAcT2 band. 

[0230] Protein samples were diluted by combining 950|ixL of H2O with 50\iL of protein 
20 sample. Samples were then analyzed using a UV spectrophotometer. Protein concentration 
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was calculated from absorption values and the molar extinction coefficients: Construct 1 
0.65mg/ml per 1 A280 unit. Construct 2 - 0.64mg/ml per 1 A280 unit, as shown in Table 7. 



Table 7: Protein concentration of IL JM109 pCWin2 MBP-GalNAcT2 Cultures after 
Solubilization and G-50 Purification 



pCWin2 
MBP- 
GalNAcT2 
Construct 

1 
2 



A280 After 
Solubilization 

0.2827 
0.2531 



Protein 
Concentration 
(mg/ml) 

2.5 
2.4 



A280 After G-50 
Purification 

0.0100 
0.0160 



Protein 
Concentration 
(mg/ml) 

0.156 
0.102 



[0231] Inclusion bodies obtained from JM109 pCWin2 MBP-GalNAcT2 constructs 1 and 2 
10 were analyzed using SDS-PAGE to verify the presence of MBP-GalNAcT2. The protein was 
clearly observed in both lanes of the gel, running at approximately lOOkDa (Figure 8). 

[0232] All four constructs were tested in a [^H]UDP-GalNAcT2 assay under all 16 refold 
conditions available in the Hampton Foldit kit (Hampton Research, Aliso Viejo, CA). 
Refolded truncated enzymes were purified by dialysis and then tested for activity using the 
15 radioactive assay, as shown in Table 8. 

Table 8: Results of the GalNAcT2 activitv assav for refolded proteins 





Raw CPM 


Refold Condition 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


Colonv 1 


112 


119 


155 


150 


131 


168 


167 


243 


111 


144 


218 


218 


166 


114 


214 


194 


Colony 2 


119 


121 


251 


143 


132 


156 " 


160 


221 


121 


166 


230 


184 


139 


137 


224 


222 


Colony 3 


125 


il3 


207 


139 


9B ' ■ 


123 


143 


170 


100 


110 


134 


■ 184 


143 


114 


' 174 


180' 


Colony 4 


122 


"123"" 


125 


■™"150 ' 


"'132 


T20 




"^~f35 


127 ' 


121 


148 


'■"154' 


" 121 


95 


157 


165' 


Average for Refold 
Condlt'm 


119.5 


119.0 


184.5 


145.5 


122.8 


141.8 


147.3 


192.3 


114.8 


135.3 


182.5 


185.0 


142.3 


115.0 


192.3 


190.3 


Negative Con 


trol 


102 






























Positive Control 


1585 






- r 






























I 




























Corrected CPM 


Refold Condition 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


Colonv 1 


10 


17 


53 


48 


29 


6S 


65 


141 


9 


42 


116 


116 


64 


12 


112 


92 


Colony 2 


17 


19 


149 


41 


30 


54 


' 58 


119 


19 


64 


128 


82 


37 


35 


122 


120 


Colony 3 


23 


11 


105 


37 


-6 


21 


41 


68 


-2 


8 


32 


82 


41 


12 


72 


78 


Colony 4 


20 


21 


23 


48 


30 


18 


17 


33 


25 


19 


46 


52 


19 


-7 


55 


63 


Average for Refold 
Condition 


17,5 


17.0 


82.5 


43.5 


20.8 


39.8 


45.3 


90.3 


12.8 


33.3 


80.5 


83.0 


40.3 


13.0 


90.3 


88.3 



20 [0233] Results from this assay indicated that refold conditions 3, 8, 1 1, 12, 15 and 16 
provided the highest CPM and therefore the greatest potential GalNAcT2 activity. 
Furthermore it appeared that construct 2 yielded the greatest number of positive hits in this 
assay, therefore efforts were focused on this construct. 
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Raw CMP3 


Refold Condition 


3 


8 


11 


12 


15 


16 


Construct 2 


924 


1197 


689 


1585 


1701 


1561 


Negative Control 


277 


(1^1 of 200ug/ml STD Enzyme) 


Positive Control 


4919 








Corrected CPM3 


Refold Condition 


3 


8 


11 


12 


15 


16 


Construct 2 


647 


920 


412 


1308 


1424 


1284 



Activity: 

U/L= ^ CMP X (nmoles Donor) x lOOtil/ml 

(Input CPM) X (0.35/0.55) x (Assay Incubation Time(Tninutes)) x Volume Enzyme 



nmoles Donor (UDP-GalN Ac) 


5 


Assay Incubation Time 

(minutes) 


30 


Volume Enzyme (^il) 


20 


Maximum Input 


48998 





Activity U/L 


3 


8 


11 


12 


15 


16 


Positive 
Control 


Construct 2 


0.17 


0.25 


0.11 


0.35 


0.38 


0.34 


26.29 



[02341 In this assay, construct 2 was tested under refold conditions 3, 8, 11, 12, 15 and 16 
5 from the Hampton Foldit kit (Hampton Research, Aliso Viejo, CA). These refolded enzymes 
were purified by G-50 gel filtration and then tested for activity by the radioactive assay. 
Results indicate that after overnight incubation on a rotator, greatest activity was obtained 
from refold condition 15. 
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Table 10: GalNAcT2 activity results froTn 5 dav refolding experiment 





nn^ CMP3 - 5 Day Refold 






Refold Condition 


3 


8 


11 


12 


15 


16 


Construct 2 


185 


2288 


186 


226 


496 


270 


Negative Control 


129 










Positive Control 


3612 


(Ijil of 200ug/ml STD Enzyme) 




















Corrected CPM3 - 5 Day Refold 




Refold Condition 


3 


8 


11 


12 


15 


16 


Construct 2 


56 


2159 


57 


97 


367 


141 



Activity: 

U/L= CMP X (nmoles Donor) x I00\x\/m\ 

(Input CPM) X (0.35/0.55) x (Assay Incubation Time(minutes)) x Volume Enzyme (pX) 



nmoles Donor (UDP-GalNAc) 


5 


Assay Incubation Time (minutes) 


30 


Volume Enzyme 


20 


Maximum Input 


47527 





Activity U/L - 5 Day Refold 


3 


8 


11 


12 


15 


16 


Positive 
Control 


Construct 2 


0.02 


0.59 


0.02 


0.03 


0.10 


0.04 


19.90 



[0235] In this assay, construct 2 was tested under refold conditions 3, 8, 1 1, 12, 15 and 16 
5 from the Hampton Foldit kit (Hampton Research, AKso Viejo, CA) after being rotated 

overnight at 4°C and left resting at 4°C for 5 days. These refolded enzymes were purified by 
G-50 gel filtration and then tested for activity by the radioactive assay. Results indicated that 
after 5 days in refold buffer 8, construct 8 displayed the highest activity. Therefore it was 
determined that conditions 8 and 15 had the greatest potential for producing a properly folded 
10 and active MBP-GalNAcT2. 

[0236] An IFa-2b assay was performed on overnight refolds of constructs 1 and 2 in refold 
buffer 15 (1-15 and 2-15, respectively) and was incubated at 32°C for 5 days. Time points 
were taken of the IFa-2b reaction at 16 hours and 5 days. The results indicate that the 
parental peak for IFa-2b is at MW -19267. A successfiil reaction would be indicated by 
15 addition of -203 molecular weight to that peak. From the 5 day data for refolds 1-15 and 2- 
15, a developing peak was observed at -119478 and -19473 respectively, a difference of 
approximately 203 MW. This data illustrated that GalNAc was added to IFa-2b by the 
refolded GalNAcT2 protein, thereby confimiing the activity that was reported elsewhere 
herein by the radioactive assay. 
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[0237] Additionally, the IFa-2b assay was performed with the 5-day refolded enzymes of 
constructs 1 and 2 in refold buffer 8 (1-8 and 2-8, respectively). The IFa-2b reactions were 
again allowed to incubate at 32''C for 3 days. Reactions were analyzed at the 3 day time 
point. The results indicated that the parental peak for Wa-2h is at MW -19263. A successful 
5 reaction would be indicated by the addition of -203 molecular weight to that peak. From the 
3 day data for refolds 1-8 and 2-8 a developing peak is seen at -19462 and 19469 
respectively, again a difference of approximately 203 MW. This data again indicated that 
GalNAc was added to IFa-2b by the refolded GaINAcT2 protein and confirmed what was 
reported by the radioactive assay. 

10 [0238] A G-CSF assay was performed on the 5-day refolded enzymes of constmct 2 in 
refold buffer 8. The G-CSF reaction was allowed to incubate at 32°C for 4 days. The 
reaction was analyzed at the 4 day time point. The parental peak for G-CSF is expected at 
MW -18786. A successful reaction would be indicated by addition of -203 molecular 
weight to that peak. From the 3 day data for refolded enzymes 2-8, a developing peak was 

15 observed at -19001, a difference of approximately 203 MW. This data again indicated that 
GalNAc was added to G-CSF by the refolded GalNAcT2 protein and confirmed what was 
reported by the radioactive assay and the IFa-2b assay as reported elsewhere herein. 

[0239] In summary, the data presented herein illustrates that E.coU-expressed MBP- 
GalNAcT2 can be refolded into an active enzyme. Under refold conditions 8 and 15, found 
20 in Hampton Research's Foldit kit (Hampton Research, AUso Viejo, CA), active 

conformations of MBP-GalNAcT2 construct 1 and 2 were obtained. The generation of a 
functional refolded protein was shown using radioactive, IFa-2b and G-CSF assays, which 
demonstrated the transfer of GalNAc to a polypeptide by GalNAcT2 truncation mutants of 
the present invention. 

25 [0240] As discussed elsewhere herein, GalNAcT2 truncation mutants of the present 

invention are also useful for the transfer of a glycosyl-polyethyleneglycol ("glycosyl-PEG") 
conjugate to a polypeptide, also known as "glycoPEGylation" of a polypeptide. Using a 
purified, refolded A51 GalNAcT2-MBP fusion made according to the present invention, it 
was shown that A51 GalNAcT2-MBP is capable of transferring a GalNAc-sialic acid (SA)- 

30 PEG conjugate to G-CSF. 

[0241] A glycoPEGylation reaction mixture was prepared in order to glycoPEGylate G- 
CSF. The reaction mixture contained 5 |al of A5 1 GalNAcT2-MBP (20 |liU), 2 |il of GalNAc- 
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a2,6-sialyltransferase (ST6GalNAcI), 6.25 mM MnCk, 15 mM UDP-GalNAc, 0.75 mM 
CMP-SA-PEG (20K), and between 2 |al and 10 |al of 2 mg/ml G-CSF. Gel electrophoresis of 
the reaction products demonstrated that A51 GalNAcT2-MBP transferred a GalNAc-siaUc 
acid (SA)-PEG conjugate to G-CSF (Figure 9). 

5 Example 3: Optimization of Purification and Refolding of A51 GalNA cT2-MBP 
[0242] A5 1 GalNAcT2 refolding and purification development as set forth herein 
demonstrates the utility of a two column purification procedure for purification of GalNAcT2 
mutants. The use of Q Sepharose Fast Flow in binding mode and Q Sepharose XL in binding 
and flow through mode as an initial purification step has been explored. Q Sepharose XL in 
10 flow through mode using a NaCl concentration of lOOmM in the load led to best recovery and 
purity of active A5 1 GalNAcT2-MBP. The use of Hydroxyapatite Type I has been 
considered as a second column step. Initial data indicate A51 GalNAcT2-MBP binds to this 
resin and can be eluted as an active enzyme with a phosphate gradient. 

[0243] A51 GalNAcT2-MBP was cloned and expressed as set forth elsewhere herein. To 
15 produce double-washed inclusion bodies (DWIBs) containing the expressed A51 GalNAcT2- 
MBP, harvested cell pellet was resuspended in lOmM Tris/ 5mM EDTA pH 7.5 (5mL/g 
cells) and lysed in two passes using a microfluidizer at 12,000psi. hiclusion bodies were 
harvested by centrifiigation at 6,000 rpm for 20 min in a Sorvall RC-3B. The pellet was 
washed t\\dce by resuspension in above buffer at 5mL/g pellet followed by centrifiigation at 
20 6,000 RPM for 20min. DWIBs were aliquoted and stored at -20°C. 

[0244] Initial studies indicated that urea solubihzation leads to higher A5 1 GalNAcT2- 
MBP activities of refolded material than does guanidine hydrochloride solubihzation. 
Therefore, A5 1 GalNAcT2-MBP was solubilized in 7M urea/ 50niM Tris/ lOmM DTT/ 5mM 
EDTA pH 8.0 for all subsequent experiments. 

25 L Refolding experiments ~ pH scout 

[0245] A pH scout was performed to identify the best pH for A5 1 GalNAcT2-MBP 
refolding. 
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Table 1 1 : Reaction parameters for pH scouting of A5 1 GalNAcT2-MBP refolding 
conditions 



Sample ref. no.: 


i D 


1 a 
la 






■5 


MES (mM) 


50 


50 








Tris (mM) 






50 


50 




L-Arginine (mM) 


550 


550 


55U 


DDK) 




JNaL/i i^mivx) 








250 


10 


KCl (mM) 


10 


10 


10 


10 




PEG 3350 (%) 


0.05 


0.05 


0.05 


0.05 


0.05 


L-cysteine (mM) 


4 


4 


4 


4 


4 


L-cj'stamine dihydrochloride 
(mM) 


1 


1 


1 


1 


1 


MnCl2 (mM) 










1 


pH 


5.5 


6.5 


8.0 


8.5 


8.0 



[0246] A51 GalNAcT2-MBP refolds were performed by solubilizing 2.5g of DWIB's in 
5 250 mL of 7M urea/ 50mM Tris/ lOmM DTT/ 5mM EDTA pH 8.0 at 4»C. 50mL solubilized 
A51 GalNAcT2-MBP DWIB's were added to IL of refold buffer at 4'*C while stirring (21- 
fold dilution - 0.5mg/mL). Refolding was allowed to proceed for 20.5h. at 4°C with stirring. 

[0247] Refolds were filtered using a Cuno Zeta Plus BioCap (Cuno, Meriden, CT), 
concentrated 4-fold and diafiltered on a 1 ft2 SOkDa MWCO TFF (regenerated cellulose) 
1 0 filter at constant volume with 5 diavolumes of 1 OmM Tris/ 5mM NaCl pH 8. 

[0248] Concentrated and diafiltered refolds were loaded onto a pre-equilibrated 48mL Q 
Sepharose Fast Flow column (Amersham Biosciences, Piscataway, NJ) and washed with 2 
column volumes (CVs) of low salt buffer (lOmM Tris/5mM NaCl pH 8.0). Protein was eluted 
with a 15CV gradient fi-om 0 to 50% high salt buffer (lOmM Tris/IM NaCl pH 8.0) followed 
15 by a ICV gradient to 100% high salt buffer. The column was regenerated with 0,5M NaOH. 

[0249] The highest A5 1 GalNAcT2-MBP activity was achieved using refold 2a conditions 
(pH 8.0) in combination with urea solubilization. Active A51 GalNAcT2-MBP eluted early 
during QSFF elution. The IL refold yielded a total of 420mU A51 GalNAcT2-MBP. 

[0250] Additional refolding conditions for A5 1 GalNAcT2-MBP were screened. Refolding 
20 buffer containing 55 mM MES pH 6.5, 264 mM NaCl, 1 1 mM KCl, 0.055% PEG 3350 and 
550 mM L-Arginine and refolding buffar contaming 55 mM Tris-HCl pH 8.0, 10.56 mM 
NaCl, 0.44 mM KCl, 0.055% PEG 3350 and 550 mM Iv-arginine were screened. Four 
conditions were screwed using the two buffers, namely, solubilization at pH 6.5 followed by 
refolding at pH 6.5, solubilization at pH 6.5 followed by refolding at pH 8.0, solubilization at 
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pH 8.0 followed by refolding at pH 6.5, and solubilization at pH 8.0 followed by refolding at 
pH 8.0. Assays of A51 GalNAcT2-MBP refolded under all four conditions demonsrated 
enzymatic activity, the ability to transfer GalNAc to GCSF. 

2. A 51 GalNAcT2-MBP Purification 
5 [0251] The use of Q Sepharose Fast Flow (QSFF) and Q Sepharose XL (QXL) (Amersham 
Biosciences, Piscataway, NJ) in A51 GalNAcT2-MBP purification was examined. QSFF was 
used in binding mode. For this purpose, concentrated diafiltered A51 GalNAcT2-MBP 
refolds (in lOmM Tris/5mM NaCl pH 8.0 - A) were appUed onto a pre-equilibrated SOmL 
QSFF column and eluted using a gradient from lOmM Tris/ SmM NaCl pH 8.0 to 50% 
10 lOmM Tris/ IM NaCl pH 8.0 (B) over 15 CV, followed by a second gradient from 50 to 
100% B over ICV. 

[0252] QXL was used in binding and in flow through mode. The NaCl concentration in the 
concentrated diafiltered A51 GalNAcT2-MBP refold material (40mL each = 160mL refold 
volume) was adjusted to 5, 50, 100, and 200mM NaCl prior to appUcation onto a 3.9mL QXL 
15 colunm. The column was washed with 2CV and bound protein was eluted with a 30CV 
gradient from A to B. 

[0253] A5 1 GalNAcT2-MBP bound tightly to QSFF resin under above conditions with 
5mM NaCl in load and equilibration buffers. Active A51 GalNAcT2-MBP eluted at the 
beginning of the major peak and appears as a doublet on a nonreduced 4-20% Tris-glycine 
20 gel. The major contaminant is a currently unidentified band running at a shghtly lower 

molecular weight close to the 98kDa marker band. A variety of other contaminants elute with 
inactive A5 1 GalNAcT2-MBP in the remainder of the major peak. 

[0254] AS 1 GalNAcT2-MBP bound tightly to QXL resin if the same conditions as for 
QSFF binding were appUed (i.e. SmM NaCl). Increasing A51 GalNAcT2-MBP activity was 

25 observed in flow through and wash at higher NaCl concentrations in the load. Interestingly, 
the major contaminating band observed in QSFF purification was not visible in the flow 
throu^ if the load contained 50 and lOOmM NaCL At both NaCl concentrations the majority 
of active A51 GalNAcT2-MBP could be found in flow through and wash; only some residual 
A51 GalNAcT2-MBP activity was detected in the left shoulder of the elution peak. As 

30 observed with QSFF resin, liie bulk of contaminating bands was observed m the major 

elution peak. Although the majority of active A51 GalNAcT2-MBP was located in the flow 
through if the salt concentration of the load was adjusted to 200niM, no significant 
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purification was achieved under this condition, hi conclusion, optimum NaCl concentration 
for the use of QXL m FT mode would be higher than SOmM NaCl, but below 200mM NaCl. 
On the basis of these data, lOOmM NaCl is a suitable concentration m the load and in the 
equilibration buffer in order to use the anion exchange resin in flowthrough mode. 

5 [0255] Hydroxyapatite Type I (80p,m) (BioRad, Hercules, CA) was examined as a second 
colximn step. Active A51 GalNAcT2-MBP partially purified over QSFF (using bind and elute 
mode) was used to investigate if active A5 1 GalNAcT2-MBP would bind to an HA Type I 
resin and would be useful to further purify the protein. For this purpose, a 2.25 mL HA Type 
I column was pre-equilibrated with 5mM NaP04/ 5mM NaCl pH 7.0 (C). Active A5 1 
10 GalNAcT2-MBP eluted from QSFF was adjusted to pH 7.0 with IM HCl aad apphed onto 
the HA Type I column. The protein was eluted using a 20 CV gradient from 0-50% 300mM 
NaP04/ 5mM NaCl pH 7.0 (D), followed by a 5 CV gradient from 50-100% D. The column 
was regenerated using 0.5M NaOH. The data obtained indicate that A51 GalNAcT2-MBP 
binds to hydroxyapatite type I resin and can be eluted as an active enzyme. 

1 5 [0256] The disclosures of each and every patent, patent application, and publication cited 
herein are hereby incorporated herein by reference in their entirety. 

[0257] While this invention has been disclosed with reference to specific embodiments, it is 
apparent that other embodiments and variations of this invention may be devised by others 
skilled in the art without departing from the true spirit and scope of the invention. The 
20 appended claims are intended to be construed to include all such embodiments and equivalent 
variations. 
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SEQUENCE LISTING 

<110> Neose Technologies, Inc. 
Johnson, Karl 
Chen, Xi 

<12 0> Truncated GalNAcT2 Polypeptides and Nucleic Acids 
<13 0> 040 853-0 1-514 9PR 
<160> 16 

<170> Patentin version 3.2 

<210> 1 

<211> 1713 

<212> DNA 

<213> human 
<220> 

<221> misc_feature 

<223> wild-type GalNAcT2 

<400> 1 



atgcggcggc 


gctcgcggat 


gctgctctgc 


ttcgccttcc 


tgtgggtgct 


gggcatcgcc 


60 


tactacatgt 


actcgggggg 


cggctctgcg 


ctggccgggg 


gcgcgggcgg 


cggcgccggc 


120 


aggaaggagg 


actggaatga 


aattgacccc 


attaaaaaga 


aagaccttca 


tcacagcaat 


180 


ggagaagaga 


aagcacaaag 


catggagacc 


ctccctccag 


ggaaagtacg 


gtggccagac 


240 


tttaaccagg 


aagcttatgt 


tggagggacg 


atggtccgct 


ccgggcagga 


cccttacgcc 


300 


cgcaacaagt 


tcaaccaggt 


ggagagtgat 


aagcttcgaa 


tggacagagc 


catccctgac 


360 


acccggcatg 


accagtgtca 


gcggaagcag 


tggcgggtgg 


atctgccggc 


caccagcgtg 


420 


gtgatcacgt 


ttcacaatga 


agccaggtcg 


gccctactca 


ggaccgtggt 


cagcgtgctt 


480 


aagaaaagcc 


cgccccatct 


cataaaagaa 


afccatcttgg 


tggatgacta 


cagcaatgat 


540 


cctgaggacg 


gggctctctt 


ggggaaaatt 


gagaaagtgc 


gagttcttag 


aaatgatcga 


600 


cgagaaggcc 


tcatgcgctc 


acgggttcgg 


ggggccgatg 


ctgcccaagc 


caaggtcctg 


660 


accttcctgg 


acagtcactg 


cgagtgtaat 


gagcactggc 


tggagcccct 


cctggaaagg 


720 


gtggcggagg 


acaggactcg 


ggttgtgtca 


cccatcatcg 


atgtcattaa 


tatggacaac 


780 


tttcagtatg 


tgggggcatc 


tgctgacttg 


aagggcggtt 


ttgattggaa 


cttggtattc 


840 


aagtgggatt 


acatgacgcc 


tgagcagaga 


aggtcccggc 


aggggaaccc 


agtcgcccct 


900 


ataaaaaccc 


ccatgattgc 


tggtgggctg 


tttgtgatgg 


ataagttcta 


ttttgaagaa 


960 


ctggggaagt 


acgacatgat 


gatggatgtg 


tggggaggag 


agaacctaga 


gatctcgttc 


1020 


cgcgtgtggc 


agtgtggtgg 


cagcctggag 


atcatcccgt 


gcagccgtgt 


gggacacgtg 


1080 


ttccggaagc 


agcaccccta 


cacgttcccg 


ggtggcagtg 


gcactgtctt 


tgcccgaaac 


1140 
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10 



15 



20 



40 



55 



acccgccggg 


cagcagaggt 


ctggatggat 


gaatacaaaa 


atttctatta 


tgcagcagtg 


1200 


ccttctgcta 


gaaacgttcc 


ttatggaaat 


attcagagca 


gattggagct 


taggaagaaa 


1260 


ctcagctgca 


agcctttcaa 


atggtacctt 


gaaaatgtct 


atccagagtt 


aagggttcca 


1320 


gaccatcagg 


atatagcttt 


tggggccttg 


cagcagggaa 


ctaactgcct 


cgacactttg 


1380 


ggacactttg 


ctgatggtgt 


ggttggagtt 


tatgaatgtc 


acaatgctgg 


gggaaaccag 


1440 


gaatgggcct 


tgacgaagga 


gaagtcggtg 


aagcacatgg 


atttgtgcct 


tactgtggtg 


1500 


gaccgggcac 


cgggctctct 


tataaagctg 


cagggctgcc 


gagaaaatga 


cagcagaca.g 


xo o u 


aaatgggaac 


agatcgaggg 


caactccaag 


ctgaggcacg 


tgggcagcaa 


cctgtgcctg 


1620 


gacagtcgca 


cggccaagag 


cgggggccta 


agcgtggagg 


tgtgtggccc 


ggccctttcg 


1680 


cagcagtgga 


agttcacgct 


caacctgcag 


cag 






1713 





<210> 


2 




<211> 


571 




<212> 


PRT 


25 


<213> 


human 




<220> 






<221> 


MIS COFEATURE 




<223> 


wild- type GalNAcT2 


30 


<400> 


2 




Met Arg Arg Arg Ser Arg Met 




1 


5 


35 


Leu Gly He Ala Tyr Tyr Met 



10 15 



20 25 30 

Gly Gly Ala Gly Gly Gly Ala Gly Arg Lys Glu Asp Trp Asn Glu He 
35 40 45 

Asp Pro He Lys Lys Lys Asp Leu His His Ser Asn Gly Glu Glu Lys 
50 55 60 



Ala Gin Ser Met Glu Thr Leu Pro Pro Gly Lys Val Arg Trp Pro Asp 
45 65 70 75 80 

Phe Asn Gin Glu Ala Tyr Val Gly Gly Thr Met Val Arg Ser Gly Gin 
85 90 95 

50 Asp Pro Tyr Ala Arg Asn Lys Phe Asn Gin Val Glu Ser Asp Lys Leu 

100 105 110 



Arg Met Asp Arg Ala He Pro Asp Thr Arg His Asp Gin Cys Gin Arg 
115 120 125 

Lys Gin Trp Arg Val Asp Leu Pro Ala Thr Ser Val Val He Thr Phe 
130 135 140 



His Asn Glu Ala Arg Ser Ala Leu Leu Arg Thr Val Val Ser Val Leu 
60 145 150 155 160 
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Lys Lys Ser Pro 



Tyr Ser Asn Asp 
180 

Val Arg Val Leu 
195 

Val Arg Gly Ala 
210 

Ser His Cys Glu 
225 

Val Ala Glu Asp 



Asn Met Asp Asn 
260 

Gly Phe Asp Trp 
275 

Gin Arg Arg Ser 
290 

Met lie Ala Gly 
305 

Leu Gly Lys Tyr 



Glu lie Ser Phe 
340 

Pro Cys Ser Arg 
355 

Phe Pro Gly Gly 
370 

Ala Glu Val Trp 
385 

Pro Ser Ala Arg 



Leu Arg Lys Lys 
420 

Val Tyr Pro Glu 
435 

Ala Leu Gin Gin 

450 

Asp Gly Val Val 
465 

Glu Trp Ala Leu 



Pro His Leu lie 

165 

Pro Glu Asp Gly 



Arg Asn Asp Arg 
200 

Asp Ala Ala Gin 
215 

Cys Asn Glu His 
230 

Arg Thr Arg Val 
245 

Phe Gin Tyr Val 



Asn Leu Val Phe 
280 

Arg Gin Gly Asn 
295 

Gly Leu Phe Val 
310 

Asp Met Met Met 
325 

Arg Val Trp Gin 



Val Gly His Val 
360 

Ser Gly Thr Val 
375 

Met Asp Glu Tyr 
390 

Asn Val Pro Tyr 
405 

Leu Ser Cys Lys 



Leu Arg Val Pro 
440 

Gly Thr Asn Cys 

455 

Gly Val Tyr Glu 
470 

Thr Lys Glu Lys 



Lys Glu lie lie 

170 

Ala Leu Leu Gly 
185 

Arg Glu Gly Leu 



Ala Lys Val Leu 
220 

Trp Leu Glu Pro 
235 

Val Ser Pro lie 
250 

Gly Ala Ser Ala 
265 

Lys Trp Asp Tyr 



Pro Val Ala Pro 
300 

Met Asp Lys Phe 

315 

Asp Val Trp Gly 
330 

Cys Gly Gly Ser 
345 

Phe Arg Lys Gin 



Phe Ala Arg Asn 
380 

Lys Asn Phe Tyr 
395 

Gly Asn lie Gin 
410 

Pro Phe Lys Trp 
425 

Asp His Gin Asp 



Leu Asp Thr Leu 

460 

Cys His Asn Ala 
475 

Ser Val Lys His 



Leu Val Asp Asp 

175 

Lys lie Glu Lys 
190 

Met Arg Ser Arg 
205 

Thr Phe Leu Asp 



Leu Leu Glu Arg 

240 

lie Asp Val lie 
255 

Asp Leu Lys Gly 
270 

Met Thr Pro Glu 
285 

lie Lys Thr Pro 



Tyr Phe Glu Glu 

320 

Gly Glu Asn Leu 
335 

Leu Glu lie lie 
350 

His Pro Tyr Thr 
365 

Thr Arg Arg Ala 



Tyr Ala Ala Val 
400 

Ser Arg Leu Glu 
415 

Tyr Leu Glu Asn 
430 

lie Ala Phe Gly 
445 

Gly His Phe Ala 



Gly Gly Asn Gin 
480 

Met Asp Leu Cys 
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Leu Thr Val Val Asp Arg Ala Pro Gly Ser Leu He Lys Leu Gin Gly 
500 505 510 

Cys Arg Glu Asn Asp Ser Arg Gin Lys Trp Glu Gin He Glu Gly Asn 

515 520 525 

Ser Lys Leu Arg His Val Gly Ser Asn Leu Cys Leu Asp Ser Arg Thr 
530 535 540 

Ala Lys Ser Gly Gly Leu Ser Val Glu Val Cys Gly Pro Ala Leu Ser 
545 550 555 560 

Gin Gin Trp Lys Phe Thr Leu Asn Leu Gin Gin 



<210> 3 
<211> 1593 
<212> DNA 
<213> human 
<220> 

<221> misc_f eature 
<223> delat 40 GalNAcT2 

<400> 3 

aggaaggagg actggaatga aattgacccc attaaaaaga aagaccttca tcacagcaat 60 

ggagaagaga aagcacaaag catggagacc ctccctccag ggaaagtacg gtggccagac 12 0 

tttaaccagg aagcttatgt tggagggacg atggtccgct ccgggcagga cccttacgcc 18 0 

cgcaacaagt tcaaccaggt ggagagtgat aagcttcgaa tggacagagc catccctgac 240 

acccggcatg accagtgtca gcggaagcag tggcgggtgg atctgccggc caccagcgtg 300 

gtgatcacgt ttcacaatga agccaggtcg gccctactca ggaccgtggt cagcgtgctt 3 60 

aagaaaagcc cgccccatct cataaaagaa atcatcttgg tggatgacta cagcaatgat 42 0 

cctgaggacg gggctctctt ggggaaaatt gagaaagtgc gagttcttag aaatgatcga 480 

cgagaaggcc tcatgcgctc acgggttcgg ggggccgatg ctgcccaagc caaggtcctg 54 0 

accttcctgg acagtcactg cgagtgtaat gagcactggc tggagcccct cctggaaagg 600 

gtggcggagg acaggactcg ggttgtgtca cccatcatcg atgtcattaa tatggacaac 660 

tttcagtatg tgggggcatc tgctgacttg aagggcggtt ttgattggaa cttggtattc 72 0 

aagtgggatt acatgacgcc tgagcagaga aggtcccggc aggggaaccc agtcgcccct 78 0 

ataaaaaccc ccatgattgc tggtgggctg tttgtgatgg ataagttcta ttttgaagaa 84 0 

ctggggaagt acgacatgat gatggatgtg tggggaggag agaacctaga gatctcgttc 90 0 

cgcgtgtggc agtgtggtgg cagcctggag atcatcccgt gcagccgtgt gggacacgtg 960 

ttccggaagc agcaccccta cacgttcccg ggtggcagtg gcactgtctt tgcccgaaac 102 0 
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a. c c c Q c c crcrcr 


c acf c acracrqt 


ctcrcfatQcrat 


gaatacaaaa 


atttctatta 


tgcagcagtg 


1080 


ccttctgcta. 


gaaacgttcc 


ttatggaaat 


attcagagca 


Qattcrcraqct 


taggaagaaa 


1140 


ctcagctgca 


agcctttcaa 


atggtacctt 


gaaaatgtct 


atccagagtt 


aagggttcca 


1200 


gaccatcagg 


atatagcttt 


tQQcrqccttcf 


cagcagggaa 


ctaactgcct 


cgacactttg 


1260 


ggacactttg 


ctgatggtgt 


ggttggagtt 


tatgaatgtc 


acaatgctgg 


gggaaaccag 


1320 


craatcfcrcrcct 


tgacgaagga 


qaaatcgqtq 


aagcacatgg 


atttgtgcct 


tactgtggtg 


1380 


gaccgggcac 


cgggctctct 


tataaagctg 


cagggctgcc 


gagaaaatga 


cagcagacag 


1440 


aaatgggaac 


agatcgaggg 


caactccaag 


ctgaggcacg 


tgggcagcaa 


cctgtgcctg 


1500 


gacagtcgca 


cggccaagag 


cgggggccta 


agcgtggagg 


tgtgtggccc 


ggccctttcg 


1560 


cagcagtgga 


agttcacgct 


caacctgcag 


cag 






1593 



<210> 4 

<211> 531 

<212> PRT 

<213> human 
<220> 

<221> MIS COFEATURE 

<223> delta 4 0 GalNAcT2 

<400> 4 

Arg Lys Glu Asp Trp Asn Glu He Asp Pro He Lys Lys Lys Asp Leu 
15 10 15 

His His Ser Asn Gly Glu Glu Lys Ala Gin Ser Met Glu Thr Leu Pro 
20 25 30 

Pro Gly Lys Val Arg Trp Pro Asp Phe Asn Gin Glu Ala Tyr Val Gly 

35 40 45 

Gly Thr Met Val Arg Ser Gly Gin Asp Pro Tyr Ala Arg Asn Lys Phe 
50 55 60 

Asn Gin Val Glu Ser Asp Lys Leu Arg Met Asp Arg Ala He Pro Asp 
65 70 75 80 

Thr Arg His Asp Gin Cys Gin Arg Lys Gin Tzp Arg Val Asp Leu Pro 
85 90 95 

Ala Thr Ser Val Val He Thr Phe His Asn Glu Ala Arg Ser Ala Leu 
100 105 110 

Leu Arg Thr Val Val Ser Val Leu Lys Lys Ser Pro Pro His Leu He 
115 120 125 

Lys Glu He He Leu Val Asp Asp Tyr Ser Asn Asp Pro Glu Asp Gly 
130 135 140 

Ala Leu Leu Gly Lys He Glu Lys Val Arg Val Leu Arg Asn Asp Arg 
145 150 155 160 
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Arg Glu Gly Leu 



Ala Lys Val Leu 
180 

Trp Leu Glu Pro 
195 

Val Ser Pro lie 
210 

Gly Ala Ser Ala 
225 

Lys Trp Asp Tyr 



Pro Val Ala Pro 
260 

Met Asp Lys Phe 
275 

Asp Val Trp Gly 
290 

Cys Gly Gly Ser 
305 

Phe Arg Lys Gin 



Phe Ala Arg Asn 
340 

Lys Asn Phe Tyr 
355 

Gly Asn lie Gin 
370 

Pro Phe Lys Trp 
385 

Asp His Gin Asp 



Leu Asp Thr Leu 
420 

Cys His Asn Ala 
435 

Ser Val Lys His 
450 

Gly Ser Leu lie 
465 

Lys Trp Glu Gin 



Met Arg Ser Arg 

165 

Thr Phe Leu Asp 



Leu Leu Glu Arg 
200 

lie Asp Val lie 
215 

Asp Leu Lys Gly 
230 

Met Thr Pro Glu 

245 

lie Lys Thr Pro 



Tyr Phe Glu Glu 
280 

Gly Glu Asn Leu 
295 

Leu Glu lie lie 
310 

His Pro Tyr Thr 
325 

Thr Arg Arg Ala 



Tyr Ala Ala Val 
360 

Ser Arg Leu Glu 
375 

Tyr Leu Glu Asn 
390 

He Ala Phe Gly 
405 

Gly His Phe Ala 



Gly Gly Asn Gin 
440 

Met Asp Leu Cys 
455 

Lys Leu Gin Gly 
470 

He Glu Gly Asn 



Val Arg Gly Ala 

170 

Ser His Cys Glu 
185 

Val Ala Glu Asp 



Asn Met Asp Asn 
220 

Gly Phe Asp Trp 
235 

Gin Arg Arg Ser 
250 

Met He Ala Gly 
265 

Leu Gly Lys Tyr 



Glu He Ser Phe 
300 

Pro Cys Ser Arg 
315 

Phe Pro Gly Gly 
330 

Ala Glu Val Trp 
345 

Pro Ser Ala Arg 



Leu Arg Lys Lys 
380 

Val Tyr Pro Glu 
395 

Ala Leu Gin Gin 
410 

Asp Gly Val Val 
425 

Glu Trp Ala Leu 



Leu Thr Val Val 
460 

Cys Arg Glu Asn 
475 

Ser Lys Leu Arg 



Asp Ala Ala Gin 
175 

Cys Asn Glu His 
190 

Arg Thr Arg Val 
205 

Phe Gin Tyr Val 



Asn Leu Val Phe 
240 

Arg Gin Gly Asn 
255 

Gly Leu Phe Val 
270 

Asp Met Met Met 
285 

Arg Val Trp Gin 



Val Gly His Val 
320 

Ser Gly Thr Val 
335 

Met Asp Glu Tyr 
350 

Asn Val Pro Tyr 
365 

Leu Ser Cys Lys 



Leu Arg Val Pro 
400 

Gly Thr Asn Cys 
415 

Gly Val Tyr Glu 
430 

Thr Lys Glu Lys 
445 

Asp Arg Ala Pro 



Asp Ser Arg Gin 
480 

His Val Gly Ser 



65 



wo 2005/121331 



PCT/US2005/019442 



485 



490 



495 



Asxi Leu Cys Leu Asp Ser Arg Thr Ala Lys Ser 
500 505 



Gly Gly Leu Ser Val 
510 



Glu Val Cys Gly Pro Ala Leu Ser Gin Gin Trp 

515 520 



Lys Phe Thr Leu Asn 
525 



Leu Gin Gin 
530 

<210> 5 

<211> 1560 

<212> DNA 

<213> human 
<220> 

<221> mis cofeature 

<22 3> delta 51 GalNAcT2 



<400> 5 
aaaaagaaag 


accttcatca 


cagcaatgga 


gaagagaaag 


cacaaagcat 


ggagaccctc 


60 


cctccaggga 


aacrtaccfcftcr 


gccagacttt 


aaccaggaag 


cttatgttgg 


aqqcfacaatq 


120 


Qtcccrctcccr 


crcf caQcracc c 


ttacgcccgc 


aacaagttca 


ac caggt gga 


gagt ga t aag 


180 


cttcgaatgg 


acagagccat 


cccfcgacacc 


cggcatgacc 


agtgtcagcg 


qaacr c acrtqq" 


240 


ccrcrcr taaa t c 


tgccggccac 


cacfCcrtQcrtcf 


atcacgtttc 


acaatgaagc 


cacrcrtCQqcc 


300 


ctactcagga 


CGcrtcrcftcacr 


ccrtcrcttaaci 


aaaagcccgc 


cGcatctcat 


aaaagaaatc 


360 


atcttggtgg 


atgactacag 


caatgatcct 


gaggacgggg 


ctctcttggg 


gaaaattgag 


420 


aaagtgcgag 


ttcttagaaa 


tgatcgacga 


gaaggcctca 


tgcgctcacg 


ggttcggggg 


480 


gccgatgctg 


cccaagccaa 


ggtcctgacc 


ttcctggaca 


gtcactgcga 


gtgtaatgag 


540 


cactggctgg 


agcccctcct 


ggaaagggtg 


gcggaggaca 


ggactcgggt 


tgtgtcaccc 


600 


atcatcgatg 


tcattaatat 


ggacaacttt 


cagtatgtgg 


gggcatctgc 


tgacttgaag 


660 


ggcggttttg 


attggaactt 


ggtattcaag 


tgggattaca 


tgacgcctga 


gcagagaagg 


720 


tcccggcagg 


ggaacccagt 


cgcccctata 


aaaaccccca 


tgattgctgg 


tgggctgttt 


780 


gtgatggata 


agttctattt 


tgaagaactg 


gggaagtacg 


acatgatgat 


ggatgtgtgg 


840 


ggaggagaga 


acctagagat 


ctcgttccgc 


gtgtggcagt 


gtggtggcag 


cctggagatc 


900 


atcccgtgca 


gccgtgtggg 


acacgtgttc 


cggaagcagc 


acccctacac 


gttcccgggt 


960 


ggcagtggca 


ctgtctttgc 


ccgaaacacc 


cgccgggcag 


cagaggtctg 


gatggatgaa 


1020 


tacaaaaatt 


tctattatgc 


agcagtgcct 


tctgctagaa 


acgttcctta 


tggaaatatt 


1080 


cagagcagat 


tggagcttag 


gaagaaactc 


^Srctgcaagc 


ctttcaaatg 


gtaccttgaa 


1140 


aatgtctatc 


cagagttaag 


ggttccagac 


catcaggata 


tagcttttgg 


ggccttgcag 


1200 
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cagggaacta actgcctcga cactttggga cactttgctg atggtgtggt tggagtttat 1260 

gaatgtcaca atgctggggg aaaccaggaa tgggccttga cgaaggagaa gtcggtgaag 132 0 

cacatggatt tgtgccttac tgtggtggac cgggcaccgg gctctcttat aaagctgcag 13 8 0 

ggctgccgag aaaatgacag cagacagaaa tgggaacaga tcgagggcaa ctccaagctg 1440 

aggcacgtgg gcagcaacct gtgcctggac agtcgcacgg ccaagagcgg gggcctaagc 15 0 0 

gtggaggtgt gtggcccggc cctttcgcag cagtggaagt tcacgctcaa cctgcagcag 1560 



<210> 6 

<211> 520 

<212> PRT 

<213> human 

<220> 

<221> MIS COFEATURE 

<223> delta 51 GalNAcT2 

<400> 6 

Lys Lys Lys Asp Leu His His Ser Asn Gly Glu Glu Lys Ala Gin Ser 
15 10 15 

Met Glu Thr Leu Pro Pro Gly Lys Val Arg Trp Pro Asp Phe Asn Gin 
20 25 30 

Glu Ala Tyr Val Gly Gly Thr Met Val Arg Ser Gly Gin Asp Pro Tyr 
35 40 45 

Ala Arg Asn Lys Phe Asn Gin Val Glu Ser Asp Lys Leu Arg Met Asp 
50 55 60 

Arg Ala lie Pro Asp Thr Arg His Asp Gin Cys Gin Arg Lys Gin Trp 
65 70 75 80 

Arg Val Asp Leu Pro Ala Thr Ser Val Val lie Thr Phe His Asn Glu 
85 90 95 

Ala Arg Ser Ala Leu Leu Arg Thr Val Val Ser Val Leu Lys Lys Ser 
100 105 110 

Pro Pro His Leu He Lys Glu He lie Leu Val Asp Asp Tyr Ser Asn 

115 120 125 

Asp Pro Glu Asp Gly Ala Leu Leu Gly Lys He Glu Lys Val Arg Val 
130 135 140 

Leu Arg Asn Asp Arg Arg Glu Gly Leu Met Arg Ser Arg Val Arg Gly 
145 150 155 160 

Ala Asp Ala Ala Gin Ala Lys Val Leu Thr Phe Leu Asp Ser His Cys 
165 170 175 

Glu Cys Asn Glu His Trp Leu Glu Pro Leu Leu Glu Arg Val Ala Glu 
180 185 190 

Asp Arg Thr Arg Val Val Ser Pro He He Asp Val He Asn Met Asp 
195 200 205 
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Asn Phe Gin Tyr Val Gly Ala Ser Ala Asp Leu Lys Gly Gly Phe Asp 
210 215 220 

Trp Asn Leu Val Phe Lys Trp Asp Tyr Met Thr Pro Glu Gin Arg Arg 
225 230 235 240 

Ser Arg Gin Gly Asn Pro Val Ala Pro He Lys Thr Pro Met He Ala 
245 250 255 

Gly Gly Leu Phe Val Met Asp Lys Phe Tyr Phe Glu Glu Leu Gly Lys 
260 265 270 

Tyr Asp Met Met Met Asp Val Trp Gly Gly Glu Asn Leu Glu He Ser 
275 280 285 

Phe Arg Val Trp Gin Cys Gly Gly Ser Leu Glu He He Pro Cys Ser 
290 295 300 

Arg Val Gly His Val Phe Arg Lys Gin His Pro Tyr Thr Phe Pro Gly 
305 310 315 320 

Gly Ser Gly Thr Val Phe Ala Arg Asn Thr Arg Arg Ala Ala Glu Val 

325 330 335 

Trp Met Asp Glu Tyr Lys Asn Phe Tyr Tyr Ala Ala Val Pro Ser Ala 
340 345 350 

Arg Asn Val Pro Tyr Gly Asn He Gin Ser Arg Leu Glu Leu Arg Lys 
355 360 365 

Lys Leu Ser Cys Lys Pro Phe Lys Trp Tyr Leu Glu Asn Val Tyr Pro 
370 375 380 

Glu Leu Arg Val Pro Asp His Gin Asp He Ala Phe Gly Ala Leu Gin 
385 390 395 400 

Gin Gly Thr Asn Cys Leu Asp Thr Leu Gly His Phe Ala Asp Gly Val 
405 410 415 

Val Gly Val Tyr Glu Cys His Asn Ala Gly Gly Asn Gin Glu Trp Ala 
420 425 430 

Leu Thr Lys Glu Lys Ser Val Lys His Met Asp Leu Cys Leu Thr Val 
435 440 445 

Val Asp Arg Ala Pro Gly Ser Leu He Lys Leu Gin Gly Cys Arg Glu 
450 455 460 

Asn Asp Ser Arg Gin Lys Trp Glu Gin He Glu Gly Asn Ser Lys Leu 
465 470 475 480 

Arg His Val Gly Ser Asn Leu Cys Leu Asp Ser Arg Thr Ala Lys Ser 
485 490 495 

Gly Gly Leu Ser Val Glu Val Cys Gly Pro Ala Leu Ser Gin Gin Trp 
500 505 510 

Lys Phe Thr Leu Asn Leu Gin Gin 
515 520 
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<210> 7 








<211> 1494 






<212> DNi\ 






5 


<213> human 






<220> 








<221> misc feature 






<223> delta 73 GalNAcT2 


10 


<400> 7 








gggaaagtac 


ggtggccaga 


ctttaaccag 




tccgggcagg 


acccttacgc 


ccgcaacaag 


15 


atggacagag 


ccatccctga 


cacccggcat 




gatctgccgg 


ccaccagcgt 


ggtgatcacg 


20 


aggaccgtgg 


tcagcgtgct 


fcaagaaaagc 




gtggatgact 


acagcaatga 


tcctgaggac 




cgagttctta 


gaaatgatcg 


acgagaaggc 


25 


gctgcccaag 


ccaaggtcct 


gaccttcctg 




ctggagcccc 


t c c t ggaaag 


9gtggcggag 


30 


gatgtcatta 


atatggacaa 


ctttcagtat 




tttgattgga 


acccggcatzc 


caagtgggat 




c aggggaac c 


cagtcgcccc 


tataaaaacc 


35 


gataagttct 


attttgaaga 


actggggaag 




gagaacctag 


agate tcgtt 


ccgcgtgtgg 


40 


tgcagccgtg 


tgggacacgt 


gttccggaag 




ggcactgtct 


ttgcccgaaa 


cacccgccgg 




aaccuctacc 


atgcagcagt 


gcct tctgct 


45 


agattggagc 


ttaggaagaa 


actcagctgc 




tatccagagt 


taagggttcc 


agaccatcag 


50 


actaactgcc 


tcgacacttt 


gggacacttt 




cacaatgctg 


ggggaaacca 


ggaatgggcG 




gatttgtgcc 


ttactgtggt 


ggaccgggca 


55 


cgagaaaatg 


acagcagaca 


gaaatgggaa 




gtgggcagca 


acctgtgcct 


ggacagtcgc 


60 


gtgtgtggcc 


cggccctttc 


gcagcagtgg 
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gaagcttatg 


ttggagggac 


gatggtccgc 


60 


ttcaaccagg 


tggagagtga 


taagcttcga 


120 


gaccagtgtc 


a-gcggaagca 


gtggcgggtg 


180 


tttcacaatg 


aagccaggtc 


ggccctactc 


240 


ccgccccatc 


tcataaaaga 


aatcatcttg 


300 


gsggctctct 


tggggaaaat 


tgagaaagtg 


360 


ctcatgcgct 


cacgggttcg 


gggggccgat 


420 


gacagtcact 


gcgagtgtaa 


tgagcactgg 


480 


gacaggactc 


gggttgtgtc 


acccatcatc 


540 


gtgggggcat 


ctgctgactt 


gaagggcggt 


600 


tacatgacgc 


ctgagcagag 


aaggtcccgg 


660 


cccatgattg 


ctggtgggct 


gtttgtgatg 


720 


tacgacatga 


tgatggatgt 


gtggggagga 


780 


cagtgtggtg 


gcagcctgga 


gatcatcccg 


840 


cagcacccct 


acacgttccc 


gggtggcagt 


900 


gcagcagagg 


tctggatgga 


tgaatacaaa 


960 


agaaacgttc 


cttatggaaa 


tattcagagc 


102 0 


aagcctttca 


aatggtacct 


tgaaaatgtc 


1080 


gatatagctt 


ttggggcctt 


gcagcaggga 


1140 


gctgatggtg 


tggttggagt 


ttatgaatgt 


1200 


ttgacgaagg 


agaagtcggt 


gaagcacatg 


1260 


ccgggctctc 


ttataaagct 


gcagggctgc 


1320 


cagatcgagg 


gcaactccaa 


gctgaggcac 


1380 


acggccaaga 


gcgggggcct 


aagcgtggag 


1440 


aagttcacgc 


tcaacctgca 


gcag 


1494 
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<210> 8 

<211> 498 

<212> PRT 

<213> human 
<220> 

<221> MISC_FEATURE 

<223> delta 73 GalNAcT2 

<400> 8 

Gly Lys Val Arg Trp Pro Asp Phe Asn Gin Glu Ala Tyr Val Gly Gly 
15 10 15 

Thr Met Val Arg Ser Gly Gin Asp Pro Tyr Ala Arg Asn Lys Phe Asn 
20 25 30 

Gin Val Glu Ser Asp Lys Leu Arg Met Asp Arg Ala lie Pro Asp Thr 
35 40 45 

Arg His Asp Gin Cys Gin Arg Lys Gin Trp Arg Val Asp Leu Pro Ala 
50 55 60 

Thr Ser Val Val lie Thr Phe His Asn Glu Ala Arg Ser Ala Leu Leu 
65 70 75 80 

Arg Thr Val Val Ser Val Leu Lys Lys Ser Pro Pro His Leu lie Lys 
85 90 95 

Glu lie lie Leu Val Asp Asp Tyr Ser Asn Asp Pro Glu Asp Gly Ala 
100 105 110 

Leu Leu Gly Lys lie Glu Lys Val Arg Val Leu Arg Asn Asp Arg Arg 
115 120 125 

Glu Gly Leu Met Arg Ser Arg Val Arg Gly Ala Asp Ala Ala Gin Ala 

130 135 140 

Lys Val Leu Thr Phe Leu Asp Ser His Cys Glu Cys Asn Glu His Trp 
145 150 155 160 

Leu Glu Pro Leu Leu Glu Arg Val Ala Glu Asp Arg Thr Arg Val Val 
165 170 175 

Ser Pro He He Asp Val He Asn Met Asp Asn Phe Gin Tyr Val Gly 
180 185 190 

Ala Ser Ala Asp Leu Lys Gly Gly Phe Asp Trp Asn Leu Val Phe Lys 
195 200 205 

Trp Asp Tyr Met Thr Pro Glu Gin Arg Arg Ser Arg Gin Gly Asn Pro 
210 215 220 

Val Ala Pro He Lys Thr Pro Met He Ala Gly Gly Leu Phe Val Met 
225 230 235 240 

Asp Lys Phe Tyr Phe Glu Glu Leu Gly Lys Tyr Asp Met Met Met Asp 
245 250 255 

Val Trp Gly Gly Glu Asn Leu Glu He Ser Phe Arg Val Trp Gin Cys 
260 265 270 
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Gly Gly Ser Leu Glu 

275 

Arg Lys Gin His Pro 
290 

Ala Arg Asn Thr Arg 
305 

Asn Phe Tyr Tyr Ala 
325 

Asn lie Gin Ser Arg 
340 

Phe Lys Trp Tyr Leu 
355 

His Gin Asp lie Ala 
370 

Asp Thr Leu Gly His 
385 

His Asn Ala Gly Gly 
405 

Val Lys His Met Asp 

420 

Ser Leu lie Lys Leu 
435 

Trp Glu Gin lie Glu 
450 

Leu Cys Leu Asp Ser 
465 

Val Cys Gly Pro Ala 
485 

Gin Gin 



lie lie Pro Cys Ser Arg 

280 

Tyr Thr Phe Pro Gly Gly 
295 

Arg Ala Ala Glu Val Trp 
310 315 

Ala Val Pro Ser Ala Arg 
330 

Leu Glu Leu Arg Lys Lys 
345 

Glu Asn Val Tyr Pro Glu 
360 

Phe Gly Ala Leu Gin Gin 
375 

Phe Ala Asp Gly Val Val 
390 395 

Asn Gin Glu Trp Ala Leu 
410 

Leu Cys Leu Thr Val Val 
425 

Gin Gly Cys Arg Glu Asn 
440 

Gly Asn Ser Lys Leu Arg 
455 

Arg Thr Ala Lys Ser Gly 
470 475 

Leu Ser Gin Gin Trp Lys 
490 



Val Gly His Val Phe 

285 

Ser Gly Thr Val Plie 
300 

Met Asp Glu Tyr Lys 
320 

Asn Val Pro Tyr Gly 
335 

Leu Ser Cys Lys Pro 
350 

Leu Arg Val Pro Asp 
365 

Gly Thr Asn Cys Leu 
380 

Gly Val Tyr Glu Cys 
400 

Thr Lys Glu Lys Ser 
415 

Asp Arg Ala Pro Gly 

430 

Asp Ser Arg Gin Lys 
445 

His Val Gly Ser Asn 
460 

Gly Leu Ser Val Glu 
480 

Phe Thr Leu Asn Leu 
495 



<210> 


9 


<211> 


1431 


<212> 


DNA 


<213> 


human 


<220> 




<221> 


misc_f eature 


<223> 


delta 94 GalNAcT2 


<400> 


9 



gggcaggacc cttacgcccg caacaagttc 
gacagagcca tccctgacac ccggcatgac 
ctgccggcca ccagcgtggt gatcacgttt 



aaccaggtgg agagtgataa gcttcgaatg 60 
cagtgtcagc ggaagcagtg gcgggtggat 12 0 
cacaatgaag ccaggtcggc cctactcagg 180 
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accgtggtca 


gcgtgcttaa 


gaaaagcccg 


ccccatctca 


taaaagaaat 


catcttggtg 


240 


gatgactaca 


gcaatgatcc 


tgaggacggg 


gctctcttgg 


ggaaaattga 


gaaagtgcga 


300 


gttcttagaa 


atgatcgacg 


agaaggcctc 


atgcgctcac 


gggttcgggg 

TZi ZJ Z3 ^ -J -J -J 


ggccgatgct 


360 


gcccaagcca 


aggtcctgac 


cttcctggac 


agtcactgcg 


agtgtaatga 


gcactggctg 


420 


gagcccctcc 


tggaaagggt 


Srgcggaggac 


aggactcggg 


ttgtgtcacc 


catcatcgat 


480 


gtcattaata 


tggacaactt 


tcagtatgtg 


ggggcatctg 

W ^ — ^ 


ctgacttgaa 


gggcggtttt 


540 


gattggaact 


tggtattcaa 


gtgggattac 


atgacgcctg 


agcagagaag 


gtcccggcag 


600 


gggaacccag 


tcgcccctat 


aaaaaccccc 


atgattgctg 


gtgggctgtt 


tgtgatggat 


660 


aagttctatt 


ttgaagaact 


9g9*gaagtac 


gacatgatga 


tggatgtgtg 


gggaggagag 


720 


aacctagaga 


tctcgttccg 


cgtgtggcag 


tgtggtggca 


gcctggagat 


catcccgtgc 


780 


agccgtgtgg 


gacacgtgtt 


ccggaagcag 


cacccctaca 


cgttcccggg 


tggcagtggc 


840 


actgtctttg 


cccgaaacac 


Gcgccgggca 


gcagaggtct 


ggatggatga 


atacaaaaat 


900 


ttctattatg 


cagcagtgcc 


ttctgctaga 


aacgttcctt 


atggaaatat 


tcagagcaga 


960 


ttggagctta 


ggaagaaact 


cagctgcaag 


cctttcaaat 


ggtaccttga 


aaatgtctat 


1020 


ccagagttaa 


gggttccaga 


ccatcaggat 


atagcttttg 


gggccttgca 


gcagggaact 


1080 


aactgcctcg 


acactttggg 


acactttgct 


gatggtgtgg 


ttggagttta 


tgaatgtcac 


1140 


aatgctgggg 


gaaaccagga 


atgggccttg 


acgaaggaga 


agtcggtgaa 


gcacatggat 


1200 


ttgtgcctta 


ctgtggtgga 


ccgggcaccg 


ggctctctta 


taaagctgca 


gggctgccga 


1260 


gaaaatgaca 


gcagacagaa 


atgggaacag 


atcgagggca 


actccaagct 


gaggcacgtg 


1320 


ggcagcaacc 


tgtgcctgga 


cagtcgcacg 


gccaagagcg 


ggggcctaag 


cgtggaggtg 


1380 


tgtggcccgg 


ccctttcgca 


gcagtggaag 


ttcacgctca 


acctgcagca 


g 


1431 



<210> 10 

<211> 477 

<212> PRT 

<213> human 
<220> 

<221> MIS COFEATURE 

<223> delta 94 GalNAcT2 

<400> 10 

Gly Gin Asp Pro Tyr Ala Arg Asn Lys Phe Asn Gin Val Glu Ser Asp 
15 10 15 

Lys Leu Arg Met Asp Arg Ala lie Pro Asp Thr Arg His Asp Gin Cys 
20 25 30 

Gin Arg Lys Gin Trp Arg Val Asp Leu Pro Ala Thr Ser Val Val He 
35 40 45 
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Thr Phe His Asn Glu Ala Arg Ser Ala Leu Leu Arg Thr Val Val Ser 
50 55 60 

Val Leu Lys Lys Ser Pro Pro His Leu He Lys Glu He He Leu Val 
65 70 75 80 

Asp Asp Tyr Ser Asn Asp Pro Glu Asp Gly Ala Leu Leu Gly Lys He 
85 90 95 

Glu Lys Val Arg Val Leu Arg Asn Asp Arg Arg Glu Gly Leu Met Arg 
100 105 110 

Ser Arg Val Arg Gly Ala Asp Ala Ala Gin Ala Lys Val Leu Thr Phe 
115 120 125 

Leu Asp Ser His Cys Glu Cys Asn Glu His Trp Leu Glu Pro Leu Leu 
130 135 140 

Glu Arg Val Ala Glu Asp Arg Thr Arg Val Val Ser Pro He He Asp 
145 150 155 160 

Val He Asn Met Asp Asn Phe Gin Tyr Val Gly Ala Ser Ala Asp Leu 
165 170 175 

Lys Gly Gly Phe Asp Trp Asn Leu Val Phe Lys Trp Asp Tyr Met Thr 
180 185 190 

Pro Glu Gin Arg Arg Ser Arg Gin Gly Asn Pro Val Ala Pro He Lys 
195 200 205 

Thr Pro Met He Ala Gly Gly Leu Phe Val Met Asp Lys Phe Tyr Phe 
210 215 220 

Glu Glu Leu Gly Lys Tyr Asp Met Met Met Asp Val Trp Gly Gly Glu 
225 230 235 240 

Asn Leu Glu He Ser Phe Arg Val Trp Gin Cys Gly Gly Ser Leu Glu 
245 250 255 

He He Pro Cys Ser Arg Val Gly His Val Phe Arg Lys Gin His Pro 
260 265 270 

Tyr Thr Phe Pro Gly Gly Ser Gly Thr Val Phe Ala Arg Asn Thr Arg 
275 280 285 

Arg Ala Ala Glu Val Trp Met Asp Glu Tyr Lys Asn Phe Tyr Tyr Ala 
290 295 300 

Ala Val Pro Ser Ala Arg Asn Val Pro Tyr Gly Asn He Gin Ser Arg 
305 310 315 320 

Leu Glu Leu Arg Lys Lys Leu Ser Cys Lys Pro Phe Lys Trp Tyr Leu 
325 330 335 

Glu Asn Val Tyr Pro Glu Leu Arg Val Pro Asp His Gin Asp He Ala 
340 345 350 

Phe Gly Ala Leu Gin Gin Gly Thr Asn Cys Leu Asp Thr Leu Gly His 
355 360 365 
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Phe Ala Asp Gly Val Val Gly Val Tyr Glu Cys His Asn Ala Gly Gly 

370 375 380 

Asn Gin Glu Trp Ala Leu Thr Lys Glu Lys Ser Val Lys His Met Asp 
5 385 390 395 400 

Leu Cys Leu Thr Val Val Asp Arg Ala Pro Gly Ser Leu lie Lys Leu 
405 410 415 

10 Gin Gly Cys Arg Glu Asn Asp Ser Arg Gin Lys Trp Glu Gin lie Glu 
420 425 430 



15 



Gly Asn Ser Lys Leu Arg His Val Gly Ser Asn Leu Cys Leu Asp Ser 
435 440 445 

Arg Thr Ala Lys Ser Gly Gly Leu Ser Val Glu Val Cys Gly Pro Ala 
450 455 460 



475 





Leu Ser Gin Gin Trp Lys Phe Thr 


20 


465 


470 




<210> 


11 




<211> 


28 


25 


<212> 


DNA 




<213> 


artificial sequence 




<220> 






<223> 


N41R primer 


30 


<400> 


11 




cgcggatcca ggaaggagga ctggaatg 




<210> 


12 


35 


<211> 


33 




<212> 


DNA 




<213> 


artificial sequence 




<220> 






<223> 


N52K primer 


40 






<400> 


12 




cgcggatcca aaaagaaaga ccttcatcac 


45 


<210> 


13 




<211> 


30 




<212> 


DNA 




<213> 


artificial sequence 




<220> 




50 


<223> 


N74G primer 




<400> 


13 




cgcggatccg ggaaagtacg gtggccagac 


55 








<210> 


14 




<211> 


27 




<212> 


DNA 




<213> 


artificial sequence 


60 


<220> 






<223> 


N95G primer 



28 



33 



30 
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<400> 14 

cgcggatccg ggcaggaccc ttacgcc 27 



<210> 


15 




<211> 


29 




<212> 


DNA 




<213> 


artificial sequence 




<220> 






<223> 


Ant is ens e Primer with STOP codon 


<400> 


15 




ctgctcgagc tactgctgca ggttgagcg 




<210> 


16 




<211> 


10 




<212> 


PRT 




<213> 


artificial sequence 




<220> 






<223> 


MuC-2 - like peptide 




<400> 


16 




Met Val Thr Pro Thr Pro Thr Pro 


Thr Cys 


1 


5 


10 



29 



75 



wo 2005/121331 



PCT/US2005/019442 



WHAT IS CLAIMED IS: 

1 1 . An isolated nucleic acid comprising a nucleic acid sequence encoding 

2 a truncated human GaBSfAcTl polypeptide, wherein said truncated human GalNAcT2 

3 polypeptide is lacking all or a portion of the GalNAcT2 signal domain, with the proviso that 

4 the encoded polypeptide is not a human GalNAcT2 truncation mutant polypeptide lacking 

5 amino acid residues 1-51. 

1 2. The isolated nucleic acid of claim 1, wherein said truncated human 

2 GalNAcT2 polypeptide is further lacking all or a portion the GalNAcT2 transmembrane 

3 domain, with the proviso that the encoded polypeptide is not a human GalNAcT2 truncation 

4 mutant polypeptide lacking amino acid residues 1-51. 

1 3. The isolated nucleic of claim 2, wherein said truncated human 

2 GalNAcT2 polypeptide is further lacking all or a portion the GalNAcT2 stem domain, with 

3 the proviso that the encoded polypeptide is not a human GalNAcT2 truncation mutant 

4 polypeptide lacking amino acid residues 1-51. 

1 4. The isolated nucleic acid of claim 1, comprising a nucleic acid 

2 sequence encoding a truncated human GalNAcT2 polypeptide, said nucleic acid sequence 

3 having at least 90% identity with a nucleic acid selected from the group consisting of SEQ ID 

4 NO:3, SEQ ID NO:7 and SEQ ID NO:9 

1 5. The isolated nucleic acid of claim 4, said isolated nucleic acid 

2 comprising a nucleic acid sequence selected from the group consisting of SEQ ID NO:3, 

3 SEQ ID NO:7 and SEQ ID NO:9. 

1 6. An isolated nucleic acid of claim 4, e said isolated nucleic acid 

2 consisting of a nucleic acid sequence selected from the group consisting of SEQ ID NO:3, 

3 SEQ ID NO:7 and SEQ ID NO:9. 

1 7. An isolated cliimeric nucleic acid encoding a fusion polypeptide, said 

2 fusion polypeptide comprising a tag polypeptide covalently linked to a second polypeptide 

3 encoded by tlie isolated nucleic acid of claim 1 . 

1 8. The isolated chimeric nucleic acid of claim 7, wherein said tag 

2 polypeptide is selected from the group consisting of a maltose binding protein, a histidine tag. 
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3 a Factor IX tag, a glutatliione-S-transferase tag, a FLAG-tag, and a starch binding domain 

4 tag. 

1 9. An isolated truncated hxmian GalNAcT2 polypeptide, wherein said 

2 truncated humaa GalNAcT2 polypeptide is lacking all or a portion of the GalNAcT2 signal 

3 domain, with the proviso that said polypeptide is not a human GalNAcT2 polypeptide 

4 truncation mutant lacking amino acid residues 1-51. 

1 10. The isolated truncated human GalNAcT2 polypeptide of claim 9, 

2 wherein said truncated human GalNAcT2 polypeptide is fijrther lacking all or a portion the 

3 GalNAcT2 transmembrane domain, with the proviso that said polypeptide is not a human 

4 GalNAcT2 polypeptide truncation mutant lacking amino acid residues 1-51. 

1 11. The isolated tmncated human GalNAcT2 polypeptide of claim 1 0, 

2 wherein said truncated human Ga]NAcT2 polypeptide is further lacking all or a portion the 

3 GalNAcT2 stem domain, with the proviso that said polypeptide is not a human GalNAcT2 

4 polypeptide truncation mutant lacking amino acid residues 1-51. 

1 12. The isolated truncated human GalNAcT2 polypeptide of claim 9, 

2 having at least 90% identity with a polypeptide selected from the group consisting of SEQ 

3 ID NO:4, SEQ ID NO:8 and SEQ ID NO:10. 

1 13. The isolated truncated human GalNAcT2 polypeptide of claim 9, 

2 comprising an amino acid sequence selected from the group consisting of SEQ ID NO:4, 

3 SEQ ID NO:8 and SEQ ID NO:10. 

1 14. The isolated truncated human GalNAcT2 polypeptide of claim 9, 

2 consisting of an amino acid sequence selected from the group consisting of SEQ ID NO:4, 

3 SEQ ID NO:8 and SEQ ID NO:10. 

1 1 5 . An isolated chimeric polypeptide comprising a tag polypeptide 

2 covalently linked to the isolated truncated GalNAcT2 polypeptide of claim 9. 

1 16. The isolated chimeric polypeptide of claim 15, wherein said tag 

2 polypeptide is selected from the group consisting of a maltose binding protein, a histidine tag, 

3 a Factor IX tag, a glutathione-S-transferase tag, a FLAG-tag, and a starch binding domain 

4 tag. 
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1 17. The isolated nucleic acid of any one of claim 1, said nucleic acid 

2 further comprising a promoter/regulatory sequence operably linked thereto. 

1 18. An expression vector comprising the isolated nucleic acid of claim 1 . 

1 19. A recombinant cell comprising the isolated expression vector of claim 

2 18. 

1 20. A recombinant cell of claim 19, wherein said recombinant cell is a 

2 eukaryotic cell or a prokaryotic cell. 

1 21 . The recombinant cell of claim 20, wherein said eukaryotic cell is 

2 selected from the group consisting of a mammalian cell, an insect cell, and a fungal cell. 

1 22. The recombinant cell of claim 21, wherein said insect cell is selected 

2 from the group consisting of an SF9 cell, an SF9+ cell, an Sf21 cell, a HIGH FIVE cell or 

3 Drosophila Schneider S2 cell. 

1 23. The recombinant cell of claim 20, wherein said prokaryotic cell is 

2 selected from the group consisting of an E. coli cell and a B. subtilis cell. 

1 24. A method of producing a truncated human GalNAcT2 polypeptide, the 

2 method comprising growing the recombinant cell of claim 20 under conditions suitable for 

3 expression of the truncated human GalNAcT2 polypeptide. 

1 25. A method of catalyzing the transfer of a GalNAc moiety to an acceptor 

2 moiety comprising incubating the polypeptide of claim 9 with a GalNAc moiety and an 

3 acceptor moiety, wherein said polypeptide mediates the covalent linkage of said GalNAc 

4 moiety to said acceptor moiety, thereby catalyzing the transfer of a GalNAc moiety to an 

5 acceptor moiety to produce a product saccharide, or a product glycoprotein, or a product 

6 glycopeptide. 

1 26. The method of claim 25, wherein said acceptor moiety is a granulocyte 

2 colony stimulating factor (G-CSF) protein. 

1 27. The method of claim 25, wherein said acceptor moiety is selected from 

2 the group consisting of erythropoietin, himian growth hormone, granulocj^e colony 
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3 stimulating factor, interferons alpha, -beta, and -gamma. Factor IX, follicle stimulating 

4 hormone, interleukin-2, erythropoietin, anti-TNF-alpha, and a lysosomal hydrolase. 

1 28. The method of claim 25, wherein said polypeptide acceptor is a 

2 glycopeptide. 

1 29. The method of claim 25, further wherein said GalNAc moiety 

2 comprises a polyethylene glycol moiety. 

1 30. The method of claim 25, wherein the product saccharide, product 

2 glycoprotein, or product glycopeptide is produced on a commercial scale. 
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10/42 
T2-41R 



1 


Pom Mi 

GMTTCGGAT CCAGGAAGGA GGACTGGAAT GAAATTGACC CCATTAAAAA GAAAGACCTT 
CTTAAGCCTA GGTCCTTCCT CCTGACCTTA CTTTAACTGG GGTAATTTTT CTTTCTGGAA 


61 


CATCACAGCA ATGGAGAAGA GAAAGCACAA AGCATGGAGA CCCTCCCTCC AGGGAAAGTA 
GTAGTGTCGT TACCTCTTCT CTTTCGTGTT TCGTACCTCT GGGAGGGAGG TCCCTTTCAT 


121 


CGGTGGCCAG ACTTTAACCA GGAAGCTTAT GTTGGAGGGA CGATGGTCCG CTCCGGGCAG 
GCCACCGGTC TGAAATTGGT CCTTCGAATA CAACCTCCCT GCTACCAGGC GAGGCCCGTC 


181 


GACCGTTACG CCCGCAACAA GTTCAACCAG GTGGAGAGTG ATAAGCTTCG AATGGACAGA 
CTGGGAATGC GGGCGTTGTT CAAGTTGGTC CACCTCTCAC TATTCGAAGC TTACCTGTCT 


241 


GCCATCCCTG ACACCCGGCA TGACCAGTGT CAGCGGAAGC AGTGGCGGGT GGATCTGCCG 
CGGTAGGGAC TGTGGGCCGT ACTGGTCACA GTCGCCTTCG TCACCGCCCA CCTAGACGGC 


301 


GCCACCAGCG TGTTGATCAC GTTTCACAAT GAAGCCAGGT CGGCCCTACT CAGGACCGTG 
CGGTGGTCGC ACCACTAGTG CAAAGTGTTA CTTCGGTCCA GCCGGGATGA GTCCTGGCAC 


ool 


RTrAGGGTGC TTAAGAAAAG CCCGCCCCAT CTCATAAAAG AAATCATCTT GTTGGATGAC 
CAGTCGCACG AATTCTTTTC GGGCGGGGTA GAGTATTTTC TTTAGTAGAA CCACCTACTG 




TACAGCAATG ATCCTGAGGA CGGGGCTCTC TTGGGGAAAA TTGAGAAAGT GCGAGTTCTT 
ATGTCGTTAC TAGGACTCCT GCCCCGAGAG AACCCCTTTT AACTCTTTCA CGCTCAAGAA 


1 O X 


AGAAATGATC GACGAGAAGG CCTCATGCGC TCACGGGTTC GGGGGGCCGA TGCTGCCCAA 
TCTTTACTAG CTGCTCTTCC GGAGTACGCG AGTGCCCAAG CCCCCCGGCT ACGACGGGTT 


541 


GCCAAGGTCC TGACCTTCCT GGACAGTCAC TGCGAGTGTA ATGAGCACTG GCTGGAGCCC 
CGGTTCCAGG ACTGGAAGGA CCTGTCAGTG ACGCTCACAT TACTCGTGAC CGACCTCGGG 


601 


CTCCTGGAAA GGGTGGCGGA GGACAGGACT CGGGTTGTGT CACCCATCAT CGATGTCATT 
GAGGACCTTT CCCACCGCCT CCTGTCCTGA GCCCAACACA GTGGGTAGTA GCTACAGTAA 


661 


AATATGGACA ACTTTCAGTA TGTGGGGGCA TCTGCTGACT TGAAGGGCGG TTTTGATTGG 
TTATACCTGT TGAAAGTCAT ACACCCCCGT AGACGACTGA ACTTCCCGCC AAAACTAACC 


721 


AACTTGGTAT TCAAGTGGGA TTACATGACG CCTGAGCAGA GAAGGTCCCG GCAGGGGAAC 
TTGAACCATA AGTTCACCCT AATGTACTGC GGACTCGTCT CTTCCAGGGC CGTCCCCTTG 


781 


CCAGTCGCCC CTATAAAAAC CCCCATGATT GtHjbibUijL ibiinjUjfti ^jvartiHH^ji iv- 
GGTCAGCGGG GATATTTTTG GGGGTACTAA CGACCACCCG ACAAACACTA CCTATTCAAG 


841 


TATTTTGAAG AACTGGGGAA GTACGACATG ATGATGGATG TGTGGGGAGG AGAGAACCTA 
ATAAAACTTC TTGACCCCTT CATGCTGTAC TACTACCTAG ACACCCCTCC TCTCTTGGAT 


901 


GAGATCTCGT TCCGCGTGTG GCAGTGTGGT GGCAGCCTGG AGATCATCCC GTGCAGCCGT 
CTCTAGAGCA AGGCGCACAC CGTCACACCA CCGTCGGACC TCTAGTAGGG CACGTCGGCA 


961 


GTGGGACACG TGTTCCGGAA GCAGCACCCC TACACGTTCC CGGGTGGGAG TGGCACTGTC 
CACCCTGTGC ACAAGGCCTT CGTCGTGGGG ATGTGCAAGG GCCCACCGTC ACCGTGACAG 


1021 


TTTGCCCGAA ACACCCGCCG GGCAGCAGAG GTCTGGATGG ATGAATACAA AAATTTCTAT 
AAACGGGCTT TGTGGGCGGC CCGTCGTCTC CAGACCTACC TACTTATGTT TTTAAAGATA 


1081 


TATGCAGCAG TGCCTTCTGC TAGAAACGTT CCTTATGGAA ATATTCAGAG CAGATTGGAG 
ATACGTCGTC ACGGAAGACG ATCTTTGCAA GGAATACCTT TATAAGTCTC GTCTAACCTC 
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T2-41R 




1141 


Kpnl 

CTTAGGAAGA AACTCAGCTG CAAGCCTTTC AAATGGTACC TTGAAAATGT 
GAATCCTTCT TTGAGTCGAC GTTCGGAAAG TTTACCATGG AACTTTTACA 


CTATCCAGAG 
GATAGGTCTC 


1201 


TTAAGGGTTC CAGACCATCA GGATATAGCT TTTGGGGCCT TGCAGCAGGG AACTAACTGC 
AATTCCCAAG GTCTGGTAGT CCTATATCGA AAACCCCGGA ACGTCGTCCC TTGATTGACG 


1261 


CTCGACACTT TGGGACACTT TGCTGATGGT GTGGTTGGAG TTTATGAATG 
GAGCTGTGAA ACCCTGTGAA ACGACTACCA CACCAACCTC AAATACTTAC 


TCACAATGCT 
AGTGTTACGA 


1321 


GGGGGAAACC AGGAATGGGC CTTGACGAAG GAGAAGTCGG TGAAGCACAT 
CCCCCTTTGG TCCTTACCCG GAACTGCTTC CTCTTCAGCC ACTTCGTGTA 


GGATTTGTGC 
CCTAAACACG 


1381 


CTTACTGTGG TGGACCGGGC ACCGGGCTCT CTTATAAAGC TGCAGGGCTG 
GAATGACACC ACCTGGCCCG TGGCCCGAGA GAATATTTCG ACGTCCCGAC 


CCGAGAAAAT 
GGCTCTTTTA 


1441 


GACAGCAGAC AGAAATGGGA ACAGATCGAG GGCAACTCCA AGCTGAGGCA 
CTGTCGTCTG TCTTTACCCT TGTCTAGCTC CCGTTGAGGT TCGACTCCGT 


CGTGGGCAGC 
GCACCCGTCG 


1501 


AACCTGTGCC TGGACAGTCG CACGGCCAAG AGCGGGGGCC TAAGCGTGGA 
TTGGACACGG ACCTGTCAGC GTGCCGGTTC TCGCCCCCGG ATTCGCACCT 


GGTGTGTGGC 
CCACACACCG 


1561 


, EooRI 

CCGGCCCTTT CGCAGCAGTG GAAGTTCACG CTCAACCTGC AGCAGTAGCT CGAGGAATTC 
GGCCGGGAAA GCGTCGTCAC CTTCAAGTGC GAGTTGGACG TCGTCATCGA GCTCCTTAAG 
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Bam Hi T2-52K 



EooRi 

1 GAATTCGGAT CCAAAAAGAA AGACCTTCAT CACAGCAATG GAGAAGAGAA AGCACAAAGC 
CTTAAGCCTA GGTTTTTCTT TCTGGAAGTA GTGTCGTTAC CTCTTCTCTT TCGTGTTTCG 



Hindlll 



61 ATGGAGACCC TCCCTCCAGG GAAAGTACGG TGGCCAGACT TTAACCAGGA AGCTTATGTT 
TACCTC TGG6 AG6GAGGTCC CTTTCATGCC ACCGGTCTGA AATTGGTCCT TCGAATACAA 

121 GGAGGGACGA TGGTCCGCTC CGGGCAGGAG CCTTACGCCC GCAACAAGTT CAACCAGGTG 
CCTCCCTGCT ACCAGGCGAG GCCCGTCCTG G6AAT GCGGG CGTT6TTCAA GTTGGTCCAC 

mm 



181 gagagtgatTMtcgaat ggacagagcc atccctgaca cccggcatga ccagtgtcag 
ctctcacta t tcgaagctta cctgtctcgg tagggactgt gggccgtact ggtcacagtc 
241 cggaagcagt ggcgggtgga tctgccggcc accagcgtgg tgatcacgtt tcacaatgaa 

GCCTTCGTCA CCGCCCACCT CTGGCACCAG TGGTCGCACC ACTAGTGCAA AGTGTTACTT 
301 GCCAGGTCGG CCCTACTCAG GACCGTGGTC AGCGTGCTTA AGAAAAGCCC GCCCCATCTC 

CG GTCCAGCC GGGATGAGTC CTGGCACCAG TCGCACGAAT TCTTTTCGGG CGGGGTAGAG 
— ATAAAAGAAA TCATCTTGGT GGATGACTAC AGCAATGATC CTGAGGACGG GGCTCTCTTG 

TATTTTCTTT AGTA GAACCA CCTACTGATG TCGTTACTAG GACTCCTGCC CCGAGAGAAC 
421 GGGAAAATTG ATCCTGAGGA AGTTCTTAGA AATGATCGAC GAGAAGGCCT CATGCGCTCA 

CCCTTTTA AC TCTTTCACGC TCAAGAATCT TTACTAGCTG CTCTTCCGGA 6TACGCGAGT 
481 CGGGTTCGGG GGGCCGATGC TGCCCAAGCC AAGGTCCTGA CCTTCCTGGA CAGTCACTGC 

GCCCAAGCCC CCCGGCTACG CCTCGGGGAG TTCCAGGACT GGAAGGACCT GTCAGTGACG 
541 GAGTGTAATG AGCACTGGCT GGAGCCCCTC CTGGAAAGGG TGGCGGAGGA CAGGACTCGG 

CTCACATTAC TCGTGACCGA CCTCGGGGAG GACCTTTCCC ACCGCCTCCT GTCCTGAGCC 



Clal 

601 GTTGTGTCAC CCATCATCGa'tGTCATTAAT ATGGACAACT TTCAGTATGT GGGGGCATCT 

CAACACAGTG GGTAGTAGCT ACAGTAATTA TACCTGTTGA AAGTCATACA CCCCCGTAGA 
661 GCTGACTTGA AGGGCGGTTT TGATTGGAAC TTGGTATTCA AGTGGGATTA CATGACGCCT 

CGACTGAACT TCCCGCCAAA ACTAACCTTG AACCATAAGT TCACCCTAAT GTACTGCGGA 
791 GAGCAGAGAA GGTCCCGGCA GGGGAACCCA GTCGCCCCTA TAAAAACCCC CATGATTGCT 

CTCGTCTCTT CCAGGGCCGT CCCCTTGGGT CAGCGGGGAT ATTTTTGGGG GTACTAACGA 
781 GGTGGGCTGT TTGTGATGGA TAAGTTCTAT TTTGAAGAAC TGGGGAAGTA CGACATGATG 

CCACCCGACA AACACTACCT ATTCAAGATA AAACTTCTTG ACCCCTTCAT GCTGTACTAC 
841 ATGGATGTGT GGGGAGGAGA GAACCTAGAG ATCTCGTTCC GCGTGTGGCA GTGTGGTGGC 

TACCTACACA CCCCTCCTCT CTTGGATCTC TAGAGCAAGG CGCACACCGT CACACCACCG 

901 agcctggaga tcatcccgtg cagccgtgtg ggacacgtgt tccggaagca gcacccctac 

TGCAAGGGCC CACCGTCACC GTCGGCACAC CCTGTGCACA AGGCCTTCGT CGTGGGGAGT 

ACGTTCCCGG GTGGCAGTGG CACTGTCTTT GCCCGAAACA CCCGCCGGGC AGCAGAGGTC 
TGCAAGGGCC CACCGTCACC GTCGGCACAC CG6GCTTTGT GGGCGGCCCG TCGTCTCCAG 

ToTi TGGATGGATG AATACAAAAA TTTCTATTAT GCAGCAGTGC CTTCTGCTAG AAACGTTCCT 
ACCTACCTAC TTATGTTTTT AAAGATAATA CGTCGTCACG GAAGACGATC TTTGCAAGGA 

1081 TATGGAAATA TTCAGAGCAG ATTGGAGCTT AGGAAGAAAC TCAGCTGCAA GCCTTTCAAA 
ATACCTTTAT AAGTCTCGTC TAACCTCGAA TCCTTCTTTG AGTCGACGTT CGGAAAGTTT 
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T2-52K 

Kpnl 

1141 TGGTACCTTG AAAATGTCTA TCCAGAGTTA AGGGTTCCAG ACCATCAGGA TATAGCTTTT 
ACCATGGAAC TTTTACAGAT AGGTCTCAAT TCCCAAGGTC TGGTAGTCCT ATATCGAAAA 

1201 GGGGCCTTGC AGCAGGGAAC TAACTGCCTC GACACTTTGG GACACTTTGC TGATGGTGTG 

cc ccggaacg tcgtcccttg attgacggag ctgtgaaacc ctgtgaaacg actaccacac 
1261 gttggagttt atgaatgtca caatgctggg gcaaaccagg aatgggcctt gacgaaggag 

CAACCTCAAA TACTTACAGT GTTACGACCC CCTTT66TCC TTACCCGGAA CTGCTTCCTC 

1321 AAGTCGGTGA AGCACATGGA TTTGTGCCTT ACTGTGGTGG accgggcacc gggctctctt 

TTCAGCCACT TCGTGTACCT AAACACGGAA TGACACCACC TGGCCCGTGG CCCGAGAGAA 
1381 ATAAAGCTGC AGGGCTGCCG AGAAAATGAC AGCAGACAGA AATGGGAACA GATCGAGGGC 

TATTTCGACG TCCCGACGGC TCTTTTACT6 TCGTCTGTCT TTACCCTTGT CTA6CTCCCG 
1441 AACTCCAAGC TGAGGCACGT GGGCAGCAAC CTGTGCCTGG ACAGTCGCAC GGCCAAGAGC 

TTGAGGTTCG ACTCCGTGCA CCCGTCGTTG GACACGGACC TGTCAGCGTG CCGGTTCTCG 
1 SOI GGGGGCCTAA GCGTGGAGGT GTGTGGCCCG GCCCTTTCGC AGCAGTGGAA GTTCACGCTC 

CCCCCGGATT CGCACCTCCA CACACCGGGC CGGGAAAGCG TCGTCACCTT CAAGTGCGAG 

Kpnl 

1561 AACCTGCAGC AGTAGCTCGA GGAATTC 

TTGGACGTCG TCATCGAGCT CCTTAAG 
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1 


T2-74G 

Bam Hi 

EooRi ^''"'^ JMUI 

GMTTCGGAT CCGGGAAAGT ACGGTGGCCA GACTTTAACC AGGAAGCTTA TGTTGGAGGG 
TGCTACCAGG GGCCCTTTCA TGCCACCGGT CTGAAATTGG TCCTTCGAAT ACCACCTCCC 




ACGATGGTCC GCTCCGGGCA GGACCCTTAC GCCCGCAACA AGTTCAACCA GGTGGAGAGT 
TGCTACCAGG CGAGGCCCGT CCTGGGAATG CGGGCGTTGT TCAAGTTGGT CCACCTCTCA 


121 


j-nndjll 

GATAAGCTTC GAATGGACAG AGCCATCCCT GACACCCGGC ATGACCAGTG TCAGCGGAAG 
rTaTTrriAAG r.TTACCTGTC TCGGTAGGGA CTGTGGGCCG TACTGGTCAC AGTCGCCTTC 


181 


CAGTGGCGGG TGGATCTGCC GGCCACCAGC GTGGTGATCA CGTTTCACAA TGAAGCCAGG 
GTCACCGCCC ACCTAGACGG CCGGTGGTCG CACCACTAGT GCAAAGTGTT ACTTCGGTCC 


241 


"tcggccctac tcaggaccgt tctgccggcc cttaagaaaa gcccgcccca tctcataaaa 

^ P apmppmpr;p7v CTCaCACCAQ GAATTCTTTT CGGGCGGGGT AGAGTATTTT 


301 


GAAATCATCT TGGTGGATGA CTACAGCAAT GATCCTGAGG ACGGGGCTCT CTTGGGGAAA 
CTTTAGTAGA ACCACCTACT GATGTCGTTA CTAGGACTCC TGGCCCGAGA GAACCCCTTT 


361 


ATTGAGAAAG TGCGAGTTCT AGCCAAGGTC UGAUbAUAAij bv,(^iCftibL,b LiLA^bbbii 
TAACTCTTTC ACGCTCAAGA ATCTTTACTA GCTGCTCTTC CGGAGTACGC GAGTGCCCAA 


421 


CGGGGGGCCG ATGCTGCCCA AGCCAAGGTC CTGACCTTCC TGGALAGiLA ^i^^^*^^^^;^ 
GCCCCCCGGC TACGACGGGT TCGGTTCCAG GACTGGAAGG ACCTGTCAGT GACGCTCACA 


481 


AATGAGCACT GGCTGGAGCC CCTCCTGGAA AGGGTGGCGG AGGACAGGAC TCGGGTTGTG 
TTACTCGTGA CCGACCTCGG GGAGGACCTT TCCCACCGCC TCCTGTCCTG AGCCCAACAC 


541 


Clal 

TCACCCATCrr^GATGTCAT TAATATGGAC AACTTTCAGT ATGTGGGGGC ATCTGCTGAC 
AGTGGGTAGT AGCTACAGTA ATTAlALLib 1 ibfiiiiibiLH i nonv^un.v^ x o 


601 


TTGAAGGGCG GTTTTGATTG GAACTTGGTA TTCAAGTGGG ATTACATGAC GCCTGAGCAG 
T^T^nrprprrnrr PTiaaiirT&Ar TTTRAAPrAT AAGTTCACCC TAATGTACTG CGGACTCGTC 


661 


AGAAGGTCCC GGCAGGGGAA CCCAGTCGCC CCTATAAAAA CCCCCATGAT TGCTGGTGGG 
^j^ppp rprTrrrPTT f^naTrAr;rf;r; GGATATTTTT GGGGGTACTA ACGACCACCC 


721 


CTGTTTGTGA TGGATAAGTT CTATTTTGAA GAACTGGGGA AGTACGACAT GATGATGGAT 
naraaararT ftrrTATTCAA RATAAAACTT CTTGACCCCT TCATGCTGTA CTACTACCTA 


781 


GTGTGGGGAG GAGAGAACCT AGAGATCTCG TTCCGCGTGT GGCAGTGTGG TGGCAGCCTG 
CACACCCCTC CTCTCTTGGA TCTCTAGAGC AAGGCGCACA CCGTCACACC ACCGTCGGAC 


841 


GAGATCATCC CGTGCAGCCG TGTGGGACAC GTGTTCCGGA AGCAGCACCC CTACACGTTC 
CTCTAGTAGG GCACGTCGGC ACACCCTGTG CACAAGGCCT TCGTCGTGGG GATGTGCAAG 


901 


CCGGGTGGCA GTGGCACTGT CTTTGCCCGA AACACCCGCC GGGCAGCAGA GGTCTGGATG 
GGCCCACCGT CACCGTGACA GAAACGGGCT TTGTGGGCGG CCCGTCGTCT CCAGACCTAC 


^ O X 


GATGAATACA AAAATTTCTA TTATGCAGCA GTGCCTTCTG CTAGAAAGCT AGCAGAGGTC 
CTACTTATGT TTTTAAAGAT AATACGTCGT CACGGAAGAC GGGCGGCCCG AGGAATACCT 


1021 


Kpnl 

AATATTCAGA GCAGATTGGA GCTTAGGAAG AAACTCAGCT GCAAGCCCTT CAAATGGTAC 
TTATAAGTCT CGTGTAACCT CGAATCCTTC TTTGAGTCGA CGTTCGGAAA GTTTACCATG 


1081 


Kpn 

CTTGAAAATG TCTATCCAGA GTTAAGGGTT CCAGACCATC AGGATATAGC TTTTGGGGCC 
GAACTTTTAC AGATAGGTCT CAATTCCCAA GGTCTGGTAG TCCTATATCG AAAACCCCGG 



FIG. 12A 

SUBSTITUTE SHEET (RULE 26) 



wo 2005/121331 



PCT/US2005/019442 



15/42 



T2-74G 



1141 TTGCAGGAGG GAACTAACTG CCTCGACACT TTGGGACACT TTGCTGATGG TGTGGTTGGA 
AACGTCGTCC CTTGATTGAC G6AGCTGTGA AACCCTGTGA AACGACTACC ACACCAACCT 



1201 


GTTTATGAAT GTCACAATGC TGGGGGAAAC CAGGAATGGG CCTTGACGAA GGAGAAGTCG 
CAAATACTTA CAGTGTTACG ACCCCCTTTG GTCCTTACCC GGAACTGCTT CCTCTTCAGC 


1261 


GTGAAGCACA TGGATTTGTG CCTTACTGTG GTGGACCGGG CACCGGGCTC TCTTATAAAG 
CACTTCGTGT ACCTAAACAC GGAATGACAC CACCTGGCCC GTGGCCCGAG CCCGTTGAGG 


1321 


CTGCAGGGCT TGGATTTGTG TGACAGCAGA CAGAAATGGG AACAGATCGA GGGCAACTCC 
GACGTCCCGA ACCTAAACAC ACTGTCGTCT GTCTTTACCC TTGTCTAGCT CCCGTTGAGG 


1381 


AAGCTGAGGC ACGTGGGCAG CAACCTGTGC CTGGACAGTC GCACGGCCAA GAGCGGGGGC 
TTCGACTCCG TGCACCCGTC GTTGGACACG GACCTGTCAG CGTGCCGGTT CTCGCCCCCG 


1441 


CTAAGCGTGG AGGTGTGTGG CCCGGCCCTT TCGCAGCAGT GGAAGTTCAC GCTCAACCTG 
GATTCGCACC TCCACACACC GGGCCGGGAA ACGCTCGTCA CCTTCAAGTG CGAGTTGGAC 




Xhol 




EccRI 


1501 


CAGCAGTAGC TCGAGGAATT C 
GTCGTCATCG AGCTCCTTAA G 
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1 


Bam Hi T2-94 
EooRi "^"^^^^^^^^^'^'^ 

GAATTCGGAT CCGGGCAGGA CCCTTACGCC CGCAACAAGT TCAACCAGGT GGAGAGTGAT 
CTTAAGCCTA GGCCCGTCCT GGGAATGCGG GCGTTGTTCA AGTTGGTCCA CCTCTCACTA 


51 


Hindlll 

aarf^TTTRAA TnnArAGAGP CATCCCTGAC ACCCGGCATG ACCAGTGTCA GCGGAAGCAG 
TTCGAAGCTT ACCTGTCTCG GTAGGGACTG TGGGCCGTAC TGGTCACAGT CGCCTTCGTC 


121 


TGGCGGGTGG ATCTGCCGGA CACLAGLGIG GiGAiLALbi i iUAL-HHivjft aiji^unvjvjiuu 
ACCGCCCACC TAGACGGCCG GTGGTCGCAC CACTAGTGCA AAGTGTTACT TCGGTCCAGC 


181 


GCCCTACTCA GGACCGTGGT CAGCGTGCTT AAGAAAAGCC CGCCCCATCT CATAAAAGAA 
CGGGATGAGT CCTGGCACCA GTCGCACGAA TTCTTTTCGG GCGGGGTAGA GTATTTTCTT 


241 


ATCATCTTGG TGGATGACTA CAGCAATGAT CCTGAGGACG GGGCTCTCTT GGGGAAAATT 
TAGTAGAACC ACCTACTGAT GTCGTTACTA GGACTCCTGC CCCGAGAGAA CCCCTTTTAA 


301 


GAGAAAGTGC GAGTTCTTAG AAATGATCGA CGAGAAAGGC TCATGCGCTC ACGGGTTCGG 
CTCTTTCACG CTCAAGAATC TTTACTAGCT GCTCTTCCGG AGTACGCGAG TGCCCAAGCC 


361 


GGGGCCGATG TGGAGCCCCT CAAGGTCCTG ACCTTCCTGG ACAGTCACTG CGAGTGTAAT 
CCCCGGCTAC GACGGGTTCG GACGGGTTCG TGGAAGGACC TGTCAGTGAC GCTCACATTA 


421 


GAGCACTGGC TGGAGCCCCT CCTGGAAAGG GTGGCGGAGG ACAGGACTCG GGTTGTGTCA 
CTCGTGACCG ACCTCGGGGA GGACCTTTCC CACCGCCTCC TGTCCTGAGC CCAACACAGT 


481 


CCCATCATCG ATGTCATTAA TATGGACAAC TTTCAGTATG TGGGGGCATC TGCTGACTTG 
GGGTAGTAGC TACAGTAATT ATACCTGTTG AAAGTCATAC ACCCCCGTAG ACGACTGAAC 


541 


AAGGGCGGTT TTGATTGGAA CTTGGTATTC AAGTGGGATT ACATUAUbUU lUAtiUftbflijH 
TTCCCGCCAA AACTAACCTT GAACCATAAG TTCACCCTAA TGTACTGCGG ACTCGTCTCT 


601 


AGGTCCCGGC AGGGGAACCC AGTCGCCCCT ATAAAAACCC CCATGATTGC TGGTGGGCTG 
TCCAGGGCCG TCCCCTTGGG TCAGCGGGGA TATTTTTGGG GGTACTAACG ACCACCCGAC 


661 


TTTGTGATGG ATAAGTTCTA TTTTGAAGAA CTGGGGAAGT ACGACATGAT GATGGATGTG 
AAACACTACC TATTCAAGAT AAAACTTCTT GACCCCTTCA TGCTGTACTA CTACCTACAC 


721 


TGGGGAGGAG AGAACCTAGA GATCTCGTTC CGCGTGTGGC AGTGTGGTGG CAGCCTGGAG 
ACCCCTCCTC TCTTGGATCT CTAGAGCAAG GCGCACACGC TCACACCACC GTCGGACCTC 


781 


ATCATCCCGT GCAGCCGTGT GGGACACGTG TTCCGGAAGC AGCACCCCTA CACGTTCCCG 
TAGTAGGGCA CGTCGGCACA CCCTGTGCAC AAGGCCTTCG TCGTGGGGAT GTGCAAGGGC 


841 


GGTGGCAGTG GCACTGTCTT TGCCCGAAAC ACCCGCCGGG CAGCAGAGGT CTGGATGGAT 
CCACCGTCAC CGTGACAGAA CTTGGATCTC TGGGCGGCCC GTCGTCTCCA GACCTACCTA 


901 


CAATACAAAA ATTTCTATTA TGCAGCAGTG CCTTCTGCTA GAAACGTTCC TTATGGAAAT 
CTTATGTTTT TAAAGATAAT ACGTCGTCAC GGAAGACGAT TCGGAAAGTT AATACCTTTA 


961 


ATTCAGAGCA GATTGGAGCT TAGGAAGAAA CTCAGCTGCA AGCCTTTCAA ATGGTACCTT 
TAAGTCTCGT CTAACCTCGA ATCCTTCTTT GAGTCGACGT TCGGAAAGTT TACCATGGAA 


1021 


GAAAATGTCT ATCCAGAGTT AAGGGTTCCA GACCATCAGG ATATAGCTTT TGGGGCCTTG 
CTTTTACAGA TAGGTCTCAA TTCCCAAGGT CTGGTAGTCC TATATCGAAA ACCCCGGAAC 


1081 


CAGCAGGGAA CTAACTGCCT CGACACTTTG GGACACTTTG CTGTAGGTGT GGTTGGAGTT 
GTCGTCCCTT GATTGACGGA GCTGTGAAAC CCTGTGAAAC GACTACCACA CCAACCTCAA 



FIG. 13A 
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T2-94 



1141 TAGAATGTC ACAATGCTGG GGGAAACCAG GAATGGGCCT TGACGAAGGA GAAGTCGGTG 
ATACTTACAG TGTTACGACC CCCTTTGGTC CTTAACCGGA ACTGCTTCCT CTTCAGCCAC 

19 01 AAGCACATGG ATTTGTGCCT TACTGTGGTG GACCGGGCAC CGGGCTCTCT TATAAAGCTG 
TTCGTGTACC TMACACGGA ATGACACCAC CTGGCCCGTG GCCCGAGAGA ATATTTCGAC 

1261 CAGGGCTGCC GAGAAAATGA CAGCAGACAG AAATGGGAAC AGATCGAGGG CAAGTCCAG 
GTCCCGACGG CTCTTTTACT TTTACCCTTG TTTACCCTTG TCTAGCTCCC GTTGAGTfC 

1 ^21 CTGAGGCACG TGGGCAGCAA CCTGTGCCTG GACAGTCGCA CGGCCAAGAG CGGGGGCCTA 
GACTCCGTGC AC CCGTCGTT GGACACGGAC CTGTCAGGGT GCCGGTTCTC GCCCCCGGAT 
10Q1 AGCGTGGAGG TGTGTG GCCC GGCCCTTTCG CAGCAGTG6A AGTTCACGCT CAACCT6CAG 
^^^^ TCGCACCTCC ACACACCGGG CCGGGAAAGC GTCGTCACCT TCAAGTGCGA GTTGGACGTC 

Xhoi ~ 
EccRI 

1441 CAGTAGCTCG AGGAATTC 

GTCATCGAGC TCCTTAAG . 



FiG. 13B 
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AC# 


Sample Description 


Activity (Ui) 


%RSD 


Volume (ml) 


Activity (U) 


AC04-08477 


040603 lb A6 


0.10 


22.3 


45 


0.00 


AC04-08478 


040603 lb B7 


0.20 


42.4 


45 


0.01 


AC04-08479 


040603 lb B6 


0.15 


10.7 


45 


0.01 


AC04-08480 


040603 1b B5 


0.19 


78.4 


45 


0,01 


AC04^8481 


040603 1b 84 


0.06 


0.0 


45 


0.00 


AC04-08482 


040603 lb 83 


0.10 


4.9 


45 


0.00 


AC04^8483 


040603 lb 82 


0.39 


76.4 


45 


0.02 


AC04^8484 


040603 1b C6 


0.02 


35.1 




0.00 


AC04-08485 


040603 lb FR 


0.04 


2.2 


1000 


0.04 


AC04-08486 


040603 lb cFR 


0.09 


2.3 


250 


0.02 


AC04-08487 


040603 lb CP 


-0.01 


8.4 


750 


0.00 


AC04-08488 


040603 lb OP 


-0.02 


9.0 


1000 


-0.02 


AC04-08489 


040603 lb QL 


0.03 


0.6 


245 


0.01 


AC04-08490 


040603 IbQFT 


-0.04 


8.3 


245 


■0.01 


AC04-08491 


040603 lb QW 


■0.02 


19.8 


48 


0.00 



FIG. 14C 
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AC# 


Sample Description 


Activity 
i\ i/i \ 

(U/L) 


%RSD 


Volume (ml] 


Activity (U) 


Neaative Control 
(NoEnzvme) 




U.UU 


It.O 






AC04-08482 


040603 1a FR 


1.08 


24.0 


1000 


1.08 


AC04-08463 


040603 la cFR 


1.03 


4.4 


250 


0,26 


AC04-08464 


040603 1a CP 


0.13 


17.8 


750 


0.09 


AC04-08465 


040603 1a DP 


0.08 


5.7 


1000 


0.08 


AC04-08466 


040603 laQFT 


0.07 


34.0 


270 


0.02 


AC04-08467 


040603 la QW 


0.05 


24.4 


48 


0.00 


AC04-08468 


040603 la B12-10 


1.60 


9.0 


45 


0.07 


AC04-08469 


040603 1a B9-7 


2.21 


11.7 


45 


0.10 


AC04-08470 


040603 la BB-4 


0.88 


17.0 


45 


0.04 


AC04^8471 


040603 1a 83-1 


0.43 


4.4 


45 


0.02 


AC04-08472 


040603 1a C1-3 


0.36 


22.2 


45 


0.02 


AC04-08473 


040603 la C4-6 


0.32 


4.S 


45 


0.01 


AC04^8474 


040603 1a C7-9 


0.22 


1.2 


4S 


0.01 


AC04-08475 


040603 la C10-12 


0.2? 


71.E 


4£ 


0.01 


AC04-08476 


040603 laXI 


m 


6.: 


2£ 


0.00 



FIG. 15C 
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Specif. Act. 
(U/mg) 


0.186 


0.099 


0.028 


















l^ass(mg) 


0.806 


1.908 


2.862 


CO 

CO 

CO 








114.2 


1 


16.92 


CO 
CM 


Cone 
(mg/mL) 


0.0179 


0.0424 


0.0636 


0.148 








0.431 


1 


0.188 


0.045 


A280 


0.027 


0.064 


0.096 


0.223 








0.651 


0.000 


0.284 


0.068 


\ctivity(U) 


0.15 


C7> 


OO 
CD 


0.02 


CD 


5 

CD 


CD 
CD 
CD 


CD 
CD 
CD 


CD 
CD 
CD 


CNI 
CVJ 

CD 


-0.05 


CO 
CO 
CD 


CO 
CD 

CD 
I 


CM 
CD 

CD 
1 


5 

CD 
1 


/olume(ml) 


LO 


LO 


LO 


LO 


LO 


LO 


to 


to 


LO 


CD 
LO 


1000 


lO 
CO 
CNI 


CD 
CO 


CD 
CD 


OO 


as: 


CO 

ai 


CM 
CD 


r^. 

CO 


CD 
LO 


od 


CO 

od 


18.4 




24.8 


73.3 


CO 
CNJ 


cr> 

LO 


CO 


LO 

od 


CO 

cvi 


































Activity 
(U/L) 


CO 
CM 

CO 


4.14 


CD 


0.52 


0.12 


0.16 


CO 
CD 
CD 


CD 
CD 
CD 


0.07 


0.30 


LO 
CD 

CD 
1 


CM 


CM 

CD 
1 


CM 
CVJ 

CD 
I 


-0.21 


Sample Description 


040603 2a A6 


040603 2a B7 


040603 2a 86 


040603 2a B5 


040603 2a 84 


040603 2a 83 


040603 2a 82 


040603 2a 81 


040603 2a C6 


040603 2a CP 


040603 2a DP 


d 

CO 
CM 

CO 
CD 
CO 
CD 
^ 
CD 


040603 2a QFTI 


CVJ 

CO 
CD 
CO 
CD 


040603 2a QW 


< 


CVJ 

CD 

CD 
1 


AC04-08493 


AC04-08494 


AC04-08495 


AC04-08496 


AC04-08497 


AC04-08498 


o> 

CD 
CD 


AC04-08500 


AC04-08501 


CNI 
CD 
LO 
OO 
CD 
1 


Aa)4-08503 


AC04-08504 


LO 
CD 
LO 
CO 
CD 
1 


AC04-08506 



o 
g 

LL. 
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0.01 


5 

CD 


0.00 


CD 
CD 

CD 


CD 
CD 

CD 


CD 
CD 

CD 


CD 
CD 

CD 


CD 
CD 

CD 


CD 
CD 

CD 


CNI 

<r> 

CD 
1 


LO 
CD 

CD 
1 


5 

CD 


CD 

CD* 
1 


CD 
CD 
CD 


0.00 




Activi 


































Volume (ml) 


to 


LO 


LO 


LO 


LO 


LO 


LO 


LO 


LO 


CD 
LO 


CD 
CD 
CD 


CD 
LO 
CNJ 


CO 


CO 
UO 






%RSD 


14.4 


CD 
CD 




OO 

ai 

CNI 


CD. 

cd" 




CNJ 

csi 


co 

CO 


CO 
CNJ 


CD 


CO 

o6 




CO 


od 
X— 


47.1 




Activity 
(U/L) 


CD 


CO 
CD 


0.01 


CNJ 

CD 
CD 


lO 

CD 

CD 
t 


-0.03 


CD 

CD 
1 


CNJ 
CD 

CD 
■ 


-.0.03 


CNI 
CD 

CD 
1 


LO 

CD. 

CD 
1 


LO 
CD 
CD 


CD 

CD 
1 


CD 

CD 
1 


CD 
CD 




im pie Description 


040603 3 A6 


040603 3 B7 


040603 3 86 


040603 3 85 


040603384 


040603 3 83 


0406033 82 


040603381 


040603 3 C1 


040603 3 CP 


040603 3 DP 


040603 3 QL 


040603 3 QFTI 


£^ 

CO 
CO 
CD 
CO 
CD 
^ 
CD 


040603 3 QW 




TO 

CO 


































5 


AC04-08522 


AC04-08523 


AC04-08524 


AC04-08525 


AC04-08526 


AC04-08527 


AC04-08528 


AC04-08529 


AC04-08530 


AC04-08531 


AC04-08532 


AC04-08533 


AC04-08534 


AC04-08535 


CO 
CO 
LO 
OO 
CD 
t 



o 

CO 

g 

u. 
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Specific 
Activity 
(U/mg) 


0.007 








0.073 


0.075 


0.007 




lyiass 
(mg) 


CM 
CN 
CM 


LO 


CD 


LO 
CD 


oq 


to 

CO 




oo 

CD 






CD 


CD 


CD 


LO 

CO 




LO 
CD 


CD 






CD 






CD 


CD 


CD 




A280/1.51 
(mg/ mL) 


0.385 


0.103 




0.010 


0.107 


0.144 


0.165 


0.319 


OO 


OO 
lO 


LO 
LO 


CD 
CD 
CD 


.016 


CM 
CO 


.218 


CD 
CM 


CM 
OO 




CD 


CD 


^D 


o 


CD 


CD 


C3 


CD 


Activity 
(U/L) 


2.73 


0.02 


CD 
CD 




7.74 


10.90* 


1.15* 


CD 
CD 

CD 


Volume 
(mL) 


CO 
LO 


CO 
lO 


OO 


LO 


LO 


LO 


LO 


LO 
CM 
CM 




Load 


t 


Wash 


LO 

< 


A6 


B7 


CO 
CD 


B5-B1 



GO 
CO 

D 
O 



P3 
O 

o 

• I— < 

I- 

I 

-o 

CD 
+-> 
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Specific 
Activity 
(U/mg) 


0.003 








0.057 


0.032 


0.004 


Mass 
(mg) 


14.6 








CO 
CD 


CO 
CO 

CD 


CO 
LO 

UO 


Activity 
(mU) 


49.2 










CD 


22.4 


A280/ 1.51 
(mg/mL) 


0.364 








0.032 


0.066 


0.279 


A280 


0.550 


-0.017 


-0.008 


0.000 


0.048 


0.100 


0.422 


Activity 
(U/L) 


CO 
CSJ 


CM 
CD 

CD 
1 


CD 

CD 
1 


0.41 


1.82 


CM 


CVJ 


Volume 
(mL) 

i 


CD 


CD 


CO 


LO 






CD 
CM 




Load 


t 


Wash 


CO 
< 


A4 


A5 


A6-A9 



o 

CM 

d 
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Specific 
Activity 
(U/mg) 


0.0035 


LO 
LO 

CD 






0.125 1 


0.0275 




0.001 


IVIass 
(mg) 


14.6 


CO 
CD 

CD 






CM 

CD 

CD 


CO 
CD 


uo 

CD 


LO 
LO 


Activity 
(mU) 


oq 

LO 


16.4 






LO 






oq 

LO 


A280/ 1.51 
(mghiL) 


0.364 


0.007 






0.003 


0.032 


0.090 


0.277 


A280 


0.550 


0.001 


-0.007 


-0.018 


" 0.004 


0.049 


0.136 


0.418 


Activity 
(U/L) 


1.27 


0.41 


0.07 


0.40 


CD 
LO 

CD 


oq 

CD 




C7> 
CNJ 

CD 


Volume 
(mL) 


CD 


CD 


oq 


LO 


LO 


LO 




CD 
CNJ 




Load 


t 


1 Wash 


A2 


CO 


A4 


<: 


A6-A9 



o 

IS 
(D 
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specific 
Activity 
(U/mg) 


0.0034 


0.059 


0.028 


0.015 


0.003 


0.0002 


CO -r-i 
CO 0> 


14.9 


0.48 


CO 
X — 

CD 


CD 
CD 


CNJ 
CD 


5.15 


Activity 
(mU) 


50.0 


28.4 


CO 


CO 
CD 


CO 
CD 


CD 


A280/1.51 
(mgAnL) 


0.372 


0.012 


0.017 


0.007 


0.041 


0.206 


A280 


0.561 


0.019 


0.026 


0.011 


0.062 


0.311 


Activity 
(U/L) 


1.25 


0.71 


0.48 


CO 

T 

CD 


CD 


CD 

ci 


Volume 
(mL) 


o 


O 


CO 


LO 


UD 


m 

CNJ 




Load 


t 


Wash 


CO 
<C 


A4 


A5-A9 



o 

CM 
CM 

d 
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Specific 
Activity 
(U/mg) 


0.0031 


0.0081 




0.0156 


0.0206 


0.0002 


IVIass 
(mg) 


15.3 


4.00 


0.66 


0.53 


0.91 


0.62 


Activity 
(mU) 


48.0 


32.4 




8.25 


oq 
od 


— 

cz> 


A280/1.51 
(mg/ mL) 


0.383 


0.100 


0.085 


0.105 


0.181 


0.062 


A280 


0.579 


0.151 


0.128 


0.158 


0.273 


0.093 


Activity 
(U/L) 


1.20 


0.81 


0.01 


1.65 


3.76 


0.01 


Volume 
(mL) 


o 


O 


oq 


LO 


lO 


O 




Load 




Wash 


A3 


A4 


9/9V 



o 

CO 
CM 

o 

11. 
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GalNAcT2 activities of refolded MBP- 
GalNAcT2(D51) 




pH 6.5-6.5 pH 8-6.5 pH 6.5-8 pH 8-8 



FIG. 25 
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pH effort on the MBP-GalNAGT2(^51) specific 
activities 

0 After dial (mU/mL) 




pH 6.5-6.5 pH 8-6.5 pH 6.5-8 pH 8-8 



pH Solubiliz. - refold 

FIG. 26 
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